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FINAL  YEAR  1  REPORT 
Contract  DAAH01-89-C-0418 
31  March  1989  -  31  March  1992 
Period  Covered:  April  1989  -  March  1990 
Date:  March  1990 

OPTICAL  NEURAL  NETS 
FOR  SCENE  ANALYSIS 

ABSTRACT 

Our  objective  is  to  develop  new  neural  net  algorithms  and  architectures  for  scene  analysis 
and  to  demonstrate  them  on  a  fabricated  new  hardware  laboratory  neural  net.  Our  approach 
marries  pattern  recognition  and  neural  net  techniques  and  optical/digital  technologies.  Our 
hardware  laboratory  system  uses  digital  and  optical  neural  net  hardware  in  an  analog  neural  net. 
Our  algorithms  are  intended  to  be  useful  on  such  low  accuracy  analog  hardware.  Our  algorithms 
cover  a  wide  range  of  neural  net  algorithms  and  architectures.  These  can  all  be  utilized  on  the 
same  laboratory  hardware.  Our  algorithms  include  five  new  optimization  neural  nets  (matrix- 
inversion,  mixture,  multitarget  tracking,  symbolic,  and  production  system  neural  nets)  and  an 
adaptive  neural  net  (adaptive  clustering  neural  net).  /  £  / 
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1.  INTRODUCTION 


1.1  OBJECTIVE 

Our  objective  is  to  produce  new  neural  net  algorithms  and  architectures  for  use  in  scene 
analysis.  We  also  intend  to  demonstrate  these  algorithms  in  real-time  on  a  new  hybrid 
laboratory  system.  We  consider  a  number  of  optimization  neural  nets  and  one  adaptive  neural 
net.  A  unique  aspect  of  our  work  is  that  all  of  these  neural  net  algorithms  can  be  implemented 
in  real-time  on  the  same  multifunctional  neural  net  hardware  laboratory  system.  Another  unique 
aspect  of  our  work  is  its  attention  to  marrying  both  pattern  recognition  and  neural  net 
techniques.  We  view  a  major  property  of  neural  nets  to  be  their  ability  to  produce  nonlinear 
decision  surfaces  (as  are  needed  for  difficult  pattern  recognition  problems  -  those  for  which 
neural  nets  are  required). 

1.2  APPROACH 

Our  approach  is  hybrid  and  multidisciplinary.  We  marry  pattern  recognition  and  neural 
net  techniques.  We  also  marry  optical  and  digital  technologies.  A  hybrid  neural  net  thus 
results.  We  also  concentrate  on  the  use  of  one  basic  hybrid  architecture  that  is  useful  for 
implementing  various  optimization  and  adaptive  neural  nets.  Our  work  thus  distinguishes 
between  these  two  general  classes  of  neural  nets  (optimization  and  adaptive)  with  both  being 
realizable  on  the  same  basic  hybrid  architecture. 

1.3  OVERVIEW 

Chapter  2  provides  an  overview  of  our  processor  with  attention  to  its  multifunctional 
nature  and  its  general  architecture.  Chapter  3  details  the  present  status  of  the  system,  and  our 
present  simulation  status  of  it  for  one  specific  neural  net  (mixture  neural  net).  This  is  the  first 
meaningful  neural  net  simulation. 

Chapters  4-9  then  detail  our  five  specific  optimization  neural  nets.  Chapter  4  presents  our 
mixture  neural  net  (applied  to  an  imaging  spectrometer  case  study).  Chapters  5  and  6  present 
our  multitarget  tracking  neural  net  work  with  attention  to  a  cubic  energy  neural  net  (Chapter  5) 
and  a  preferable  quadratic  energy  neural  net  (Chapter  6)  requiring  a  simpler  optical  processor. 
Chapters  7-9  summarize  simulated  (Chapter  7)  and  laboratory  (Chapter  8)  data  on  our  symbolic 
and  production  system  neural  nets  based  on  our  initial  concepts  (Chapter  9). 

Our  adaptive  neural  net  research  is  included  in  our  extensive  summary  of  problems  in 
present  neural  nets  and  a  new  adaptive  clustering  neural  net  using  pattern  recognition  and 
neural  net  techniques  (Chapter  10). 

Section  1.4  provides  a  summary  of  the  various  neural  nets  we  have  considered.  Papers 
published  and  submitted  during  the  first  year  of  this  project  follow  in  Section  1.5.  These  10 
papers  represent  an  enormous  one  year  output  and  indicate  the  completeness  with  which  we  have 
treated  all  aspects  of  neural  nets  for  scene  analysis  with  attention  to  new  algorithms  and 
applications,  a  combination  of  optical/digital  techniques  for  implementation,  a  multifunctional 
hybrid  optical/digital  neural  net  architecture,  its  laboratory  realization  and  a  new  adaptive 
clustering  neural  net  algorithm  (combining  pattern  recognition  and  neural  net  techniques).  This 
effort  is  thus  quite  significant  in  terms  of  algorithms,  architectures,  and  hardware. 


1.4  SUMMARY  OF  NEURAL  NETS  (NNs)  CONSIDERED 

The  seven  neural  nets  we  have  considered  are  now  briefly  summarized. 


The  input  neurons  to  the  production  system  NN  are  facts  (antecedents  and  consequents). 
Objects  and  object  parts  are  used  in  our  initial  work.  Surface  types  for  object  parts  (cylinder, 
sphere,  valley,  ridge,  etc.)  can  also  be  used  in  future  work.  The  objects  are  typical  of  those 
present  in  various  scenes.  The  weights  define  the  rules.  These  are  initially  posed  as  if-then 
statements,  with  all  rules  written  as  the  AND  of  several  antecedents  and  the  OR  of  several  such 
sets  of  antecedents.  The  output  neurons  that  fire  represent  the  new  facts  that  are  now  learned  to 
be  true.  As  the  system  iterates,  it  learns  new  rules  and  infers  new  results  on  the  present  input 
data.  We  initially  consider  a  propositional  calculus  system  (with  all  parameters  being  exact 
terms)  and  then  plan  to  address  a  predicate  calculus  system  (with  parameters  being  variables) 
that  is  much  more  powerful. 

The  E  input  neurons  in  our  mixture  NN  each  correspond  to  the  fractional  amounts  of  E 
elements  present  in  a  mixture  of  elements  within  one  region  of  an  input  scene.  The  outputs  from 
two  matrix-vector  multiplications  are  combined  to  form  the  new  neuron  states.  After  a  number 
of  iterations,  the  final  neuron  states  denote  the  fractional  amount  of  each  element  present  in  the 
input  mixture. 

The  matrix  inversion  NN  produces  the  inverse  of  a  matrix  that  is  given  to  the  processor. 
To  calculate  the  inverse  X  of  a  matrix  Q,  we  realize  that  QX  =  I.  We  formulate  the  solution 
(the  elements  of  the  inverse  of  Q)  as  the  minimization  of  an  energy  function  .  We  solve  for  the  X 
that  minimizes  the  energy  function  on  a  neural  net.  The  matrix  elements  (weights)  in  this  NN 
have  an  attractive  block  Toeplitz  form  and  thus  acousto-optic  (AO)  architectures  should  be  very 
suitable  for  implementing  this  NN.  This  represents  the  first  AO  NN.  Since  matrix  inversions 
are  required  in  many  pattern  recognition  linear  discriminant  function  designs  and  in  most 
adaptive  algorithms,  this  NN  should  have  general  computational  use  in  image  processing  (as  well 
as  in  adaptive  radar,  control,  etc.). 

The  cubic  energy  NN  for  MTT  takes  measurements  on  objects  in  each  of  three  frames  and 
it  assigns  one  target  per  measurement  and  time  frame.  This  is  useful  for  time  sequential  scene 
analysis  to  associate  objects  (or  object  parts)  in  several  time  frames. 

The  quadratic  energy  MTT  NN  is  a  simplified  version  of  the  cubic  energy  NN.  It  processes 
pairs  of  time  frames.  The  resultant  optical  architecture  is  much  simpler  than  the  cubic  energy 
NN  and  significantly  reduces  component  requirements. 

The  symbolic  NN  combines  a  symbolic  correlator,  production  system  NN,  feature  extractor 
and  image  processing  NN.  Its  major  advantage  is  the  ability  to  process  multiple  objects  in  the 
field  of  view  (this  is  achieved  by  the  symbolic  correlator).  No  other  NN  has  this  ability.  It 
outputs  a  symbolic  description  of  each  region  of  the  input  that  denotes  which  generic  shapes  are 
present  and  their  location.  These  data  are  then  symbolically  encoded  and  fed  to  an  NN.  The 
NN  is  unique  because  of  its  symbolic  input  neuron  representation.  Alternatively,  the  locations  of 
regions  of  interest  in  the  input  scene  are  used  to  guide  the  positioning  of  window  functions  (for 
segmentation)  from  which  input  features  are  extracted  and  subsequently  fed  to  an  NN  for  object 
classification.  These  NNs  again  combine  pattern  recognition  and  NN  techniques. 


The  adaptive  clustering  NNis  our  major  effort.  The  input  neurons  are  features,  the  hidden 
layer  neurons  are  prototypes  of  the  various  classes  of  objects  and  the  output  neurons  denote  the 
class  of  the  input  object.  Clustering  techniques  are  used  to  select  the  original  hidden  layer 
neurons  (we  allow  several  neurons  or  clusters  per  object  class)  and  hence  the  initial  input  to 
hidden  layer  weights.  These  represent  a  set  of  linear  discriminant  functions  (LDFs).  The  output 
neurons  define  the  class  of  the  input.  The  hidden  to  output  layer  weights  map  the  clusters  to 
classes.  Our  study  of  criterion  functions  determined  the  type  of  error  function  used  to  train  the 
NN.  Thus,  advanced  pattern  recognition  techniques  are  used  to  initialize  the  set  of  NN  weights. 
A  new  adaptive  NN  learning  algorithm  is  then  used  to  refine  and  improve  the  initial  weight 
estimates  and  to  produce  the  LDF  combinations  that  provide  the  nonlinear  piecewise 
discriminant  surfaces  finally  used.  This  is  the  adaptive  learning  stage.  This  new  NN  combines 
pattern  recognition  and  NN  techniques. 
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ABSTRACT 


A  multi-functional  hybrid  neural  net  is  described.  It  is  hybrid  since  it  uses  a  digital  hardware 
Hecht-Nielsen  Corporation  (HNC)  neural  net  for  adaptive  learning  and  an  optical  neural  net  for  on-line 
processing/classification.  It  is  also  hybrid  in  its  combination  of  pattern  recognition  and  neural  net 
techniques.  The  system  is  multi-functional,  it  can  function  as  an  optimization  and  adaptive  pattern 
recognition  neural  net,  as  well  as  an  auto  and  heteroassociative  processor. 

1 .  INTRODUCTION 


Neural  nets  (NNs)  have  recently  received  enormous  attention  [1-2]  with  increasing  attention  to 
the  use  of  optical  processors  and  a  variety  of  new  learning  algorithms.  Section  2  describes  our 
hybrid  NN  with  attention  to  its  fabrication  and  the  role  for  optical  and  digital  processors.  Section  3 
details  its  use  as  an  associative  processor.  Section  4  highlights  is  use  in  3  optimization  NN  problems 
(a  mixture  NN,  a  multitarget  tracker  (MTT)  NN,  and  a  matrix  inversion  NN).  Section  5  briefly  notes  it 
use  as  a  production  NN  system  and  symbolic  NN.  Section  6  describes  its  use  as  an  adaptive  pattern 
recognition  (PR)  NN  (that  marries  PR  and  NN  techniques). 

2.  HYBRID  ARCHITECTURE 


Figure  1  shows  our  basic  hybrid  NN  [3].  The  optical  portion  of  the  system  is  a  matrix- vector 
(M-V)  processor  whose  vector  output  P3  is  the  product  of  the  vector  at  P1  and  the  matrix  at  P2-  An 
HNC  digital  hardware  NN  is  used  during  learning  to  determine  the  interconnection  weights  for  P2.  If  P2 
is  a  spatial  light  modulator  (SLM),  its  contents  can  be  updated  (using  gated  learning)  from  the  digital 
NN.  The  operations  in  most  adaptive  PR  NN  learning  algorithms  are  sufficiently  complex  that  they  are 
best  implemented  digitally.  In  addition,  the  learning  operations  required  are  often  not  well  suited  for 
optical  realization;  for  optimization  NNs,  the  weights  are  fixed;  and  in  adaptive  learning,  learning  is 
off-line  and  once  completed  the  weights  can  often  be  fixed. 

Four  gates  are  shown  that  determine  the  final  output  or  the  new  P  input  neurons  (Depending  on 
the  application).  We  briefly  discuss  these  cases  now  and  detail  how  each  arises  in  subsequent 
sections.  In  most  optimization  NNs,  an  external  vector  a  is  added  to  the  P3  output  (Gate  1  achieves 
this).  In  all  NNs,  a  nonlinear  thresholding  (P3  outputs  are  0  or  1 ),  truncation  (all  P3  outputs  lie  between 
0  and  1 ),  or  maximum  selection  (the  maximum  P3  output  is  set  to  1  and  all  other  P3  outputs  to  0) 
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operation  is  required  (Gate  2  achieves  this).  In  a  3-layer  NN  (adaptive  PR),  a  second  M-V  operation  is 
performed  with  the  P3  neurons  as  inputs  (Gate  3  achieves  this).  When  the  neuron  outputs  are  fed 
back  to  the  P1  input  neurons  (in  an  iterative  NN),  linear  or  binary  P  neurons  may  be  used  (Gate  4 
produces  the  proper  P  input  neurons). 

Figure  2  shows  the  optical  portion  of  the  NN  as  fabricated.  A  stripe  electroded  liquid  crystal 
device  (LCD)  serves  as  P1  and  a  computer-generated  hologram  (CGH)  as  the  P2  set  of 
interconnection  weights  [4]  (this  allows  real-time  realization  on  an  SLM  with  minimum  space 
bandwidth  product  and/or  improved  light  efficiency  to  avoid  the  large  insertion  loss  of  a  standard  M-V 
processor).  An  IBM  PC/AT,  DT  2821  data  acquisition  board,  and  various  special-purpose  hardware 
provide  all  digital/electronic  support  functions. 

3.  ASSOCIATIVE  PROCESSOR  (AP)  USES 


The  first  type  of  NN  we  consider  is  an  AP.  We  consider  pseudoinverse  £6]  and  other  advanced 
APs  (such  as  the  Ho-Kashyap  AP)  [5],  since  they  have  more  storage  capacity  and  better  noise 
performance  than  Hopfield  and  other  APs.  They  also  require  only  one  pass  through  the  P1-P3  system 
(rather  than  many  iterations,  as  in  other  APs).  Thus,  these  systems  use  only  the  P1  -P3  M-V  processor 
(the  P1  input  is  the  key  vector,  the  matrix  at  P2  is  fixed  and  the  P3  output  is  the  recollection  vector  - 
most  closely  associated  with  the  input  key  vector).  We  emphasize  heteroassociative  processors 
(HAPs),  since  they  make  decisions  (i.e.  the  recollection  vector  encoding  denotes  the  class  of  the 
input  key  vector  data).  We  also  employ  P1  input  neuron  spaces  that  are  features,  facts,  or  symbolic 
descriptions  of  an  object  (this  significantly  reduces  the  dimensionality  required  -  the  number  of 
neurons).  With  only  32  input  neurons,  we  have  recognized  over  1 000  distorted  input  objects  in  1 0 
classes.  For  these  neural  nets,  the  P1  neurons  are  linear,  the  P2  weights  are  analog,  and  the  Pg 
neurons  are  binary  (or  use  maximum  selection).  We  note  that  P2  can  also  be  the  data  matrix,  in  which 
case  a  nearest  neighbor  AP  NN  results. 

NEURON  REPRESENTATION  SPACE 

A  key  issue  in  all  of  our  NN  AP  systems  has  been  the  use  of  a  variety  of  neuron  representation 
spaces.  These  include:  iconic  (one  neuron  per  pixel  in  an  image),  feature  spaces,  symbolic  (facts 
etc.)  data,  and  encoded  correlator  output  data.  Iconic  neuron  spaces  require  a  very  large  number  of 
neurons  and  are  thus  not  attractive.  They  also  result  in  neuron  spaces  that  are  not  distortion  or  shift 
invariant  (i.e.  they  require  many  training  images,  one  for  each  possible  shift  or  distortion).  Feature 
space  representations  result  in  fewer  neurons  (a  considerable  reduction  in  dimensionality)  with 
various  levels  of  distortion  and/or  shift  invariance  and  are  thus  very  attractive  and  preferable. 
Symbolic  and  correlator  representations  allow  multiple  objects  to  be  handled  (all  other  neural 
systems  require  preprocessing  to  isolate  one  object  in  the  field  of  view,  before  inputing  the  data  to  a 
NN  for  processing). 


4.  OPTIMIZATION  NEURAL  NETS 

A  major  class  of  NNs  are  optimization  NNs  (rather  than  PR  NNs).  In  most  such  cases,  these  NNs 
are  characterized  by  iterative  M-V  processors  with  a  fixed  P2  set  of  interconnection  weights,  the 
addition  of  an  external  vector  a  to  the  P3  output,  and  a  nonlinear  function  (before  the  output  is  fed 
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back  to  P^.  The  fixed  P2  weights  allow  us  to  efficiently  use  the  optical  portion  of  Figure  1  with  P2 
being  a  film-based  (CGH  etc.)  mask.  We  briefly  note  3  optimization  NNs  and  discuss  their  realization 
on  Figure  1 . 

4.1  MIXTURE  NN 

In  this  case,  the  input  signal  c  is  the  sum  of  a  number  of  reference  functions  ke  (e  =  1  •  E 

references  exist,  plus  unknowns) 


CrEx^*5.  (1) 

e 

We  desire  to  find  the  fractional  amounts  xe  of  those  references  ke  present.  This  problem  arises  in  the 
analysis  of  imaging  spectrometry  data  [7]  and  in  other  cases.  We  minimize  the  MSE  function  E1  = 

E(c -Exke)2  and  the  constraint  E.  =  (Ex -1  )2  that  the  sum  of  the  x  is  unity.  We  also  insure  that 

_  '  n  _  e  n  '  2  '  e  '  e 

n  e  e 

all  xfi  satisfy  0<xg<1.  The  solution  x  using  the  neural  evolution  equation 

dx/dtoc  -  9E(x)/3x  (2) 

can  be  written  as  the  matrix-vector  equation  (for  discrete  time  t)  as 

x(t+1)  =  <6[Tx(t) +  a],  (3) 

where  <!>  satisfies  0<xe<1,  the  matrix  T  =  KTK+J  is  fixed  (K  is  the  data  matrix  of  the  references,  K  = 

[k1  •  •  •  kE]  and  KTK  is  the  VIP  (vector  inner  product)  matrix  of  reference  data),  and  the  vector  a  s  KTc 
is  known  (it  varies  with  the  input,  but  is  fixed  during  the  iterations  in  Eq.(3)).  To  implement  (3)_on 
Figure  1 ,  we  input  x  to  Pv  and  the  fixed  film  mask  P2  is  T.  We  add  a  to  P3  (through  Gate  1 )  and  apply  <j> 
through  Gate  2  to  produce  the  new  P1  neuron  values  (linear  input  neurons  are  formed  using  Gate  4). 

For  this  NN  application,  we  have  modeled  the  various  error  sources  in  the  optical  processor  (with 
a  random  variable  with  a  given  standard  deviation  denoting  various  analog  optical  system  accuracies 
and  noise  sources).  We  find  [3]  that  the  error  in  the  P2  weights  is  the  most  dominant  error  source 
(together  with  the  uniformity  of  the  P1  illumination).  With  proper  P2  mask  encoding  and  correction  for 
P1  illumination,  sufficient  accuracy  exists.  We  have  tested  the  algorithm  with  various  mixtures  of 

minerals  (where  kne  is  the  reflectance  spectra  of  mineral  e  at  various  wavelengths  n  and  c  is  the  sum 
of  several  such  reference  signals).  Table  1  shows  the  results  obtained  for  only  one  element  present 
(pure  with  both  small  and  large  grain  size)  and  with  mixtures  of  10  different  elements  with  different 
amounts  of  noise  present  in  the  composite  input  signal  c.  The  average  number  of  iterations  and  the 
worst-case  error  in  any  xfi  are  noted. 

4.2  MATRIX-INVERSION  NN 


This  NN  produces  the  inverse  of  a  matrix  directly  [83.  This  operation  is  vital  for  various  real  time 
phased  array  radar,  signal  processing,  PR,  and  NN  applications.  Consider  calculating  the  inverse  of 
the  matrix  Q  with  elements  Qab.  The  values  of  the  neurons  Xflb  will  denote  the  elements  of  O'1.  We 
solve  this  by  minimizing  the  MSE  function  <  1 
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{1  =  ^-'^'t^QabXbc"£ab^2- 


a  c 


(4) 


When  e  =  0,  Q  X  =  I  and  X  =  Q' 1 .  Substituting  into  the  neuron  state  time  evolution  equation 


a  x^/a  t  =  ~«d  E/a  =  -»[SEQcaQcdxdb-Qba]  (5) 

c  d 

and  discretizing  time  <5t  with  X  =  -n6t  where  n  is  the  time  index,  we  obtain 


c  d 


(6) 


We  write  this  as  a  matrix-vector  equation 

x(n+1)  =  x(n) +  X[Tx(n)  -  s]  (7) 

where  x  is  a  vector  of  N2  elements  or  neurons  (the  NxN  elements  of  Q'1),  T  is  an  N2xN2 
interconnection  matrix,  and  s  is  the  1  -D  list  of  the  elements  Q  . . 

From  (7),  we  see  that  this  NN  formulation  is  similar  to  (3)  for  the  mixture  NN.  It  can  thus  be 
implemented  on  Figure  1  as  before.  The  x(n)  term  on  the  right  hand  side  of  (7)  can  be  included  in  the 
M-V  product  T  x  by  the  addition  of  ones  to  the  diagonal  of  T.  Thus  the  new  x(n+1 )  values  are  given  by 
a  M-V  product  with  the  prior  x(n)  vector  with  a  vector  s  subtracted  from  the  result.  The  vector  s 
subtraction  is  performed  through  Gate  1  with  linear  neuron  values  for  the  next  n+1  time  step  produced 
using  Gate  4.  The  only  notable  exception  is  that  the  matrix  T  weights  now  vary  as  a  function  of  the 
matrix  W  being  inverted  (by  comparison,  the  matrix  in  the  mixture  NN  is  fixed  for  a  given  database). 
The  block  Toeplitz  structure  of  T  allows  for  a  very  novel  and  efficient  acousto-optic  NN  realization 

[8] ,  However,  here  we  emphasize  the  realization  of  a  variety  of  NNs  on  the  same  architecture  (Figure 
1 )  with  a  2-D  SLM  (or  film)  at  P2. 

Table  2  summarizes  results  obtained  with  various  matrices.  As  seen,  the  number  of  iterations 
required  is  modest  and  the  MSE  accuracy  of  the  resultant  Q'1  inverse  is  generally  within  the  1% 
accuracy  expected  from  an  analog  processor.  This  algorithm  thus  appears  very  attractive  since 
round-off  errors  do  not  accumulate  and  a  meaningful  result  (with  1  %  accuracy)  is  obtained  with  a  1  % 
accurate  analog  optical  processor. 

4.3  MULTITARGET  TRACKER  (MTT)  NEURAL  NET 

We  devised,  described,  and  simulated  an  MTT  NN  using  only  position  sensor  data.  This  system 

[9]  resulted  in  a  new  NN  that  minimized  a  cubic  energy  function.  The  optical  architecture  required 
multiplication  of  a  matrix  (the  vector  outer  product  (VOP)  of  the  present  neuron  state)  times  a  tensor. 
Our  algorithm  and  architecture  reduced  the  required  2-D  space  bandwidth  product  for  the  tensor  by  a 
factor  of  over  1000.  Although  this  cubic  energy  function  NN  algorithm  is  very  attractive,  it  requires  a 
new  tensor  for  each  new  set  of  measurements.  Instead  of  using  measurement  data  over  3  time 
frames,  we  devised  a  new  quadratic  energy  MTT  NN  using  range  and  velocity  sensor  data.  This  is 
preferable,  results  in  a  simpler  NN  system  using  a  fixed  2-D  mask  (rather  than  a  real  time  high 


I 

I 
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bandwidth  2-D  or  3-D  SLM).  Both  of  these  NNs  are  measurement  based,  require  fewer  input  neurons 
than  other  MTT  NNs,  do  not  require  track  initiation,  nor  a  Kalman  filter  or  extended  Kalman  filter  post 
processor.  We  now  describe  our  quadratic  MTT  NN  algorithm,  its  realization  on  Figure  1,  and  initial 
data  results. 

We  assume  Nm  measurements  in  two  sequential  time  frames  i  and  j.  The  objective  of  the 

processor  is  to  assign  each  measurement  at  time  i  to  a  measurement  at  time  j.  We  use  Nm2  neurons 
X...  We  use  the  differences  between  all  measurement  pairs.  The  energy  or  error  function  to  be 
minimized  with  respect  to  the  neuron  state  X  is 

E(X)  =  c.EEx  d  +  c2E(Ex  -1)2  +  c3E(Ex  - 1  )2.  (8) 

i  i  1  i  i  i  i 

Term  1  is  a  minimum  when  measurements  i  and  j  in  two  time  frames  are  the  most  similar  (term  1 
reduces  the  strengths  of  neurons  associated  with  large  D..).  Terms  2  and  3  insure  that  for  each 
measurement  i  (j)  in  frame  1  (2)  there  is  only  one  measurement  j  (i)  in  frame  2  (1)  associated  with  it. 
The  weights  C.,-C3  are  chosen  to  emphasize  the  term  desired  (dependent  upon  sensor  properties, 
scenarios,  etc.).  We  use  the  Hopfield  neural  evolution  equation 

Xk|(n+1)  =  Xk|(n)-^Xk|  (9) 

where  n  is  the  discrete  time  index,  rj  is  the  step  size,  and  from  (8) 

^Xk|  =  d  E(X)/3  xkl  =  Dk(  +  2(EXk.  -  1 )  +  2(EX..  -  1 ).  (10) 

We  write  (1 0)  as  the  matrix- vector  equation 


4X.  =  Em..X.  +  D.  (11) 

and  thus  we  can  implement  this  algorithm  on  the  system  of  Figure  1 . 

The  vector  X.  is  the  Nm2  dimensional  lexicographically  ordered  vectorized  version  of  the  X.. 

neurons  at  Pv  Df  is  the  Nm2  difference  vector  added  to  the  P3  outputs,  and  M  is  a  fixed  (film-based) 
matrix  at  P2  in  Figure  1.  As  before,  we  add  D.  to  the  P3  output  through  Gate  1  (in  Figure  1),  we  use 
Gate  2  to  apply  a  nonlinearity  to  the  output  to  insure  0<X.  <  1.  The  final  P1  neurons  are  now  binary 
(Gate  4). 

Figure  3  shows  the  typical  >C  neuron  outputs  (in  2-D,  for  ease  of  understanding)  at  different  time 
steps  in  their  evolution.  The  amount  of  area  shaded  in  the  2-D  outputs  denote  the  neuron  analog 
output.  As  seen,  the  final  output  has  one  "on"  (value  ^1)  neuron  per  row  and  column  (i.e.  one 
measurement  pair  assigned  in  each  time  frame).  Excellent  P^  =  100%  performance  has  been  obtained 
in  noise  for  various  scenarios  with  this  algorithm  [10]. 
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5.  SYMBOLIC  AND  PRODUCTION  SYSTEM  NNs 
5.1  PRODUCTION  SYSTEM  NN 


When  a  predicate  calculus  set  of  rules  as  "if-then"  statements  describes  a  production  system, 
one  can  employ  a  NN  to  determine  new  true  facts  (activated  output  P3  neurons)  given  true  input  facts 
(activated  P1  neurons)  and  rules  (as  P^Pg  interconnection  weights).  Figure  4  shows  the  NN  for  the 
simple  set  of  4  rules:  A— B,  a  AND  C  AND  F—G,  B— C,  F  AND  G— *C.  Each  fact  (A  to  G)  is  assigned  to 
one  input  and  output  neuron.  To  implement  this  on  Figure  1,  true  facts  are  denoted  by  activated  P 
neurons  and  new  inferred  facts  are  given  by  P3  neurons  that  exceed  threshold.  The  rules  (weights) 
are  a  fixed  P2  matrix.  The  P3  output  is  thresholded  to  produce  binary  neurons  (Gates  2  and  4)  for  the 
next  P  input.  This  produces  new  inferences  and  new  rules  not  explicitly  encoded. 

5.2  SYMBOLIC  CORRELATOR  NN 

We  have  interfaced  a  production  system  to  a  multichannel  optical  correlator  to  achieve  a 
symbolic  correlator  [11].  This  is  a  most  unique  NN,  since  it  is  the  only  NN  that  allows  multiple 
objects  to  be  handled  in  parallel  with  both  shift  and  distortion  invariance  for  scene  analysis.  Figure  5 
shows  the  basic  concept  of  this  processor.  The  multi-channel  optical  correlator  provides  a  multiple 
output  (D-digit  symbolic  word)  for  each  object  present  in  the  field  of  view  (FOV).  An  optical 
correlator  thus  allows  multiple  objects  to  be  handled  and  true  shift  invariance.  The  multichannel 
correlator  used  and  its  symbolic  output  allows  many  classes  of  objects  to  be  identified  (with  D=4 
channels  or  filters  and  L=10  output  levels,  over  10,000  object  classes  can  be  accommodated  on  one 
processor).  The  recent  simulated  [12]  and  real  time  [1 1]  optical  laboratory  data  we  have  obtained 
used  filters  to  recognize  the  presence  of  various  object  parts,  the  encoding  of  these  symbols  was 
then  fed  to  a  production  system  NN.  Excellent  distortion-invariant  and  multiple  object  results  were 
obtained  [1 1  ]. 


6.  ADAPTIVE  LEARNING  PATTERN  RECOGNITION  (PR)  NN 

Various  multi-layer  NNs  can  be  produced.  With  three  neuron  layers,  any  piecewise  nonlinear 
discriminant  surface  can  be  produced.  Figure  6  shows  a  3  layer  NN  with  neuron  layers  P1  (input),  P3 
(hidden  layer)  and  Pg  (output).  The  P2  matrix  in  Figure  1  provides  the  P1  to  P3  weights  needed.  We 
implement  the  P3  to  Pg  (second  to  third  layer)  neurons  in  Figure  6  in  hardware  through  Gate  3  in 
Figure  1  (this  is  realistic  with  the  new  NN  we  consider). 

One  advantage  of  a  NN  over  standard  PR  classifiers  is  its  organized  ability  to  produce  nonlinear 
decision  surfaces.  We  feel  that  a  PR  NN  should  utilize  standard  PR  techniques  where  appropriate  and 
NN  techniques  where  they  are  preferable.  A  marriage  of  PR  and  NN  techniques  is  preferable.  Our  NN 
(Figure  6)  uses  PR  techniques  (linear  discriminant  functions  (LDFs)  and  clustering)  to  select  the 
number  of  P3  neurons  and  the  initial  P,  to  P3  weights.  NN  techniques  are  then  used  to  refine  these 
initial  weights.  A  hybrid  PR-NN  thus  results.  We  consider  a  multiclass  PR  classification  problem  with 
one  Pg  neuron  per  class  (the  activated  Pg  neuron  denotes  the  class  of  the  input  fed  to  P1 ).  The  P 
neuron  representation  space  (Section  3)  we  use  is  a  feature  space  with  inherent  shift  and  distortion 
invariance  and  with  a  low  dimensionality.  This  provides  shift  and  distortion-invariant  multiclass  PR. 
We  select  2-5  neurons  at  Pg  per  class  using  clustering  techniques.  These  are  example/prototype/or 
cluster  neurons  and  hence  we  refer  to  this  as  an  adaptive  clustering  neural  net  (ACNN).  The  initial  P, 
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to  P3  weights  are  selected  using  LDF  methods.  They  are  then  refined  using  NN  methods  to  produce 
nonlinear  piecewise  decision  surfaces. 

Most  NN  learning  algorithms  are  such  that  they  are  not  easily  realized  with  optical  processing.  We 
envision  the  use  of  gating  learning  with  learning  being  off-line  (on  the  HNC  digital  hardware  section  of 
Figure  1 )  with  the  weights  being  fixed  after  learning.  We  consider  classification  of  input  data  to  be  an 
on-line  problem  requiring  real-time  optical  processing  (on  the  P^Pg  optical  system  of  Figure  1). 
Thus,  we  employ  a  hybrid  optical-digital  NN. 

Our  present  purpose  is  to  show  how  many  different  NNs  can  be  realized  on  the  architecture  of 
Figure  1.  Thus,  we  only  briefly  highlight  our  ACNN  [13]  realization  on  Figure  1  (or  in  general,  a 
multi-layer  NN).  We  consider  only  supervised  learning.  The  input  P1  neurons  are  analog  features,  the 
P1  to  P3  weights  are  chosen  as  LDFs  and  are  subsequently  modified  (in  training  on  the  digital  NN)  to 
produce  piecewise  nonlinear  discriminant  surfaces.  Our  cluster  selection  method  determines  the 
number  of  P3  neurons  used  and  removes  this  variable  from  the  NN.  The  P1  to  P3  weights  are  analog 
and  encoded  on  a  fixed  mask  at  P2  of  Figure  1  (after  training).  The  P2  mask  can  be  adapted  as  gated 
learning  proceeds.  The  most  active  Pg  neuron  is  selected  (Gate  2  in  Figure  1 )  and  binary  P3  neurons 
result  (with  one  P3  neuron  being  the  most  active).  The  Pg  to  Pg  weights  are  binary  and  perform  a 
mapping  of  the  P3  cluster  selected  to  the  final  class  designation  (the  P5  neuron  activated).  Only  one 
pass  through  the  system  is  required  in  classification.  The  P1“P3  neuron  processing  is  optical.  The  P3 
nonlinearity  (Gate  2  in  Figure  1 )  requires  a  maximum  selection  operation.  The  P^-Pc  processing  is 

o  O 

performed  digitally  (Gate  3  in  Figure  1 ),  since  it  is  only  a  simple  mapping  and  the  number  of  Pg 
neurons  is  small.  Figure  7  shows  one  result  from  this  ACNN  for  a  3  class  problem  with  2  features. 
The  samples  in  each  class  are  indicated  by  different  symbols.  The  nonlinear  decision  surfaces 
produced  by  the  ACNN  are  indicated.  Such  surfaces  are  necessary  to  separate  these  data  samples. 
Over  98%  correct  recognition  was  achieved.  ;  , 


7.  SUMMARY  AND  CONCLUSION 


A  general  optical/digital  NN  architecture  and  its  hardware  were  described.  The  multi-functional 
nature  of  the  system  was  emphasized  -  with  the  same  processor  shown  to  be  capable  of  solving  a 
variety  of  NN  problems.  We  have  highlighted  many  of  these  uses.  The  system  functions  as  an 
associative  processor  (AP).  We  specifically  use  it  as  a  heteroassociative  processor  (HAP)  for 
distortion  invariant  pattern  recognition.  We  also  employ  it  as  a  closure  AP  (operating  on  facts).  The 
system  is  suitable  for  many  optimization  NNs.  We  have  highlighted  its  use  as  a  mixture  processor,  a 
multitarget  tracker  and  a  matrix  inversion  system.  In  the  first  2  cases,  an  external  vector  is  added  to 
the  M-V  neuron  output.  Both  analog  and  binary  neurons  are  used  (depending  upon  the  application). 
Analog  weights  are  used.  The  use  of  the  system  as  a  production  system  and  a  symbolic  correlator 
NN  were  noted  (this  NN  handles  multiple  objects  in  the  field  of  view).  Finally,  its  use  in  adaptive 
learning  (distortion  invariant  PR  classification)  was  discussed  -  where  it  functions  as  a  general  multi¬ 
layer  NN  capable  of  any  piecewise  nonlinear  discriminant  surfaces. 
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Input  spectra 

Noise 
percentage 
n  (%) 

Average 
number  of 
iterations 

Worst  case 
error  in  anv 
*,<%) 

Pure,  large 

0 

738 

3.39 

Pure,  small 

0 

1094 

11.33 

Pure,  large 
and  small 

0 

940 

5.42 

Pure,  large 

2.5 

1155 

5.16 

Pure,  large 

5 

1153 

4.91 

Mixture,  large 

2236 

0.98 

Mixture,  large 

2.5 

3603 

1.46 

Mixture.  Iar«c* 

5 

3537 

1  86 

Table  1.  Simulation  results  for  the  determination  of  the 
composition  of  an  input  element  or  mixture. 


No.  of  Iterations 

Accuracy  of  Result 

55 

1.28% 

88 

0.13% 

111 

0.01% 

Table  2.  Optical  neural  net  matrix  inversion  solution  data. 
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Figure  1 .  General  hybrid  opticat/digital  neural  net. 
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Figure  2.  Simplified  view  of  the  optical  matrix-vector  multiplier. 
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Frame  4 


B  D  C  F  A  E 


■ 

!■ 

■ 

■ 

m 

■ 

■ 

■ 

■ 

ii 

•i3 

■ 

■ 

II 

m 

s 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

m 

i 

■ 

■ 

W 

■ 

A 

■ 

■ 

m 

■ 

■ 

H 

1 

■ 

■ 

m 

■ 

B 

■ 

m 

■ 

■ 

1 

i 

i 

■ 

m 

■ 

C 

i 

■ 

■ 

■ 

■ 

■ 

1 

■ 

■ 

■ 

■ 

■ 

D 

■ 

u 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

pj| 

E 

■ 

■ 

m 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

g 

■ 

i 

0  iterations  30  iterations  50  iterations  70  iterations 

Figure  3.  Typical  MTT  NN  output  neuron  states  at  different  states  (iterations). 


outputs  (control  slgnsJs  or  just  Wodbsck) 


Figure  4.  Representative  production  system  NN. 


Figure  5.  Symbolic  correlator  NN  for  handling  multiple  objects. 
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Figure  6.  Three  layer  adaptive  clustering  NN  (ACNN). 


feature  1 

Figure  7.  Representative  piecewise  nonlinear  discriminant  surfaces  produced  by  our  ACNN. 
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ABSTRACT 


A  hybrid  optical/digital  neural  net  is  described.  Initial  tests  on  the  optical  components  are 
provided  with  the  first  simulated  neural  net  results  addressing  various  optical  system  error  sources 
included.  Attention  is  given  to  the  accuracy  required  for  each  optical  component,  the  dominant  error 
source  and  the  cumulative  effect  of  multiple  optical  system  error  sources. 

1.  PRODUCTION 


Various  neural  net  (NN)  architectures  and  algorithms  have  been  advanced.  Several  of  these  have 
been  realized  and  tested  in  limited  dimensionality  and  extent  in  the  lab.  In  Section  2,  we  advance  a 
new  and  most  general  purpose  NN.  It  is  a  hybrid  optical/digital  NN  (using  a  digital  NN  for 
training/leaming  and  an  optical  N  for  on-line  processing).  Its  wide  usage  in  a  multitude  of 
applications  will  be  detailed  elsewhere  [1],  Here  we  advance  its  basic  concept  and  architecture 
(Section  2),  we  consider  a  specific  optimization  NN  application  (mixture  analysis)  in  Section  3,  and 
we  provide  the  first  simulation  of  optical  NN  error  effects  (Section  4).  Prior  NN  simulations  [2]  have 
not  been  successful,  due  to  an  insufficient  NN  model  and/or  the  choice  of  an  NN  architecture  not 
easily  lending  itself  to  modeling. 

2.  HYBRID  OPTICAL /DIQITAL  NN  ARCHITECTURE 

Figure  1  shows  the  basic  architecture  we  consider.  It  consists  of  a  general-purpose  hardware 
digital  NN  (the  Hecht  Nielson  Corporation  (HNC)  Anza  system)  interfaced  to  an  optical  NN.  The 
optical  NN  consists  of  an  optical  matrix-vector  (M-V)  multiplier.  The  vector  data  is  fed  to  point 
modulators  at  Pv  The  P1  light  is  broadcast  to  uniformly  illuminate  different  rows  at  P2  (which  contains 
the  matrix  data).  The  light  leaving  P2  is  integrated  vertically  onto  a  linear  detector  array  at  P3.  The  P3 
output  is  thus  the  M-V  product  of  the  P2  matrix  data  and  the  P1  vector  data  The  optical  M-V 
architecture  is  the  basic  element  of  the  system.  Its  outputs  (P3)  are  processed  in  various  manners 
(depending  upon  the  application)  before  being  fed  back  to  the  P1  inputs.  We  also  allow  the  P3  output 
to  be  used  to  alter  the  P2  matrix  data  (with  an  adaptive  P2  SLM  used).  The  P3  to  P1  digital  feedback 
shown  considers  various  NNs  and  the  resultant  processor  is  very  general  purpose  [1  ]. 
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FIGURE  1.  General  hybrid  optical/digital  neural  net. 
3.  CASE  STUDY 


The  specific  case  study  we  consider  is  input  1  -D  data  which  is  a  mixture  of  several  reference  1  -D 
patterns.  Specifically,  for  the  case  of  input  imaging  spectrometer  data,  we  consider  the  input  to  be 
the  sum  of  the  several  reference  spectra  (for  different  minerals)  present  in  a  given  input  region.  The 
objective  is  to  determine  which  e  of  E  spectra  are  present  and  the  amount  x.  of  each  that  is  present. 
A  specific  case  study  (such  as  this)  is  expected  to  quantify  the  spatial  M-V  system  errors  allowed 
and  the  individual  component  requirements. 

The  signal  c  »  {cn}  received  at  each  spatial  region  of  the  scene  has  N  spectral  components  at  the 
Xn  of  the  imaging  spectrometer.  This  received  signal  is  a  mixture  of  the  reflectance  data  k®  = 
[K^  •  •  •  Kn*]  for  mineral  element  e  for  all  Xn, 

0> 

“  e  ®” 

The  objective  Is  to  determine  the  elements  e  present  (e  *  1  •  •  •  E)  and  the  fractional  amount  x#  of 

each.  The  k®  reflectances  for  E  minerals  are  available  and  the  data  matrix  K  *  [k1  •  •  •  kE]  describes 
the  reference  data.  We  have  spectra  in  N  «  1 28  reduced  bands  for  E  *  600  elements.  To  solve  (1 )  for 
x,  we  consider  a  neural  net  (NN)  solution.  We  write  the  MSE  as  one  error  term  to  be  minimized 

E^x).  1/2r(cn-Z*eKn®)2 


(2) 
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where  the  Kne  are  the  reflectance  data  at  wavelength  Xn  for  element  e.  We  also  impose  the  condition 
that  the  sum  of  the  xfi  equal  one 

E2(x)  =  1/2(rxe-1)2  (3) 

The  solution  x  that  minimizes  the  error  or  energy  E  s  e.,+E2  is  desired. 

To  describe  this  as  an  NN  optimization  problem,  we  denote  the  solution  x  by  a  neuron  array  and 
allow  the  neurons  to  evolve  as 

d  x/d  t  OC  -  d  E(x)/d  X.  (4) 

With  (4),  E  decreases  monotonically  with  time  and  no  local  minima  occur.  Substituting  E  into  (4),  and 
using  discrete  time,  we  obtain 

x(t+1)  =  ^[Tx(t) +  a],  (5) 

where  T  =  -f?KTK  + 1,  a  =  ijKtc  and  ^  is  a  nonlinear  function  that  satisfies  the  additional  constraint 
0  <  x6(t)  <1  (6) 

and  tj  •  2/TrQ<TK]  as  in  the  Widrow  Huff  LMS  algorithm. 

We  will  compare  the  evolution  solution  In  (5)  to  the  pseudoinverse  solution 

x  «  (KTK)‘1Ktc  ,  K*c  (7) 

which  does  not  satisfy  (6)  to  show  that  a  NN  solution  is  needed.  The  neural  net  solution  in  (5)  can  be 
achieved  on  the  optical  system  of  Figure  1  as  we  now  discuss.  The  matrix  T  is  placed  at  P2  (it  is  fixed 
and  film  can  be  used  for  It  -  in  this  application  -  and  In  most  optimization  NNs).  The  P1  outputs  are  x 
and  at  P3  we  obtain  T  x.  Since  K  is  fixed,  we  form  a  in  digital  hardware.  Thus,  in  Figure  1 ,  the  M-V 
multiplication  is  performed  optically  and  the  external  vector  is  calculated  digitally  and  added  to  the 
optical  M-V  result.  This  output  is  then  thresholded  and  fed  back  to  P,.  The  feedback  in  Figure  1 
achieves  this. 


3.  OPT1AL  NN  FABRICATION 

For  the  present  optimization  NN,  the  matrices  are  fixed.  This  is  the  case  for  nearly  all  optimization 
NNs.  Thus,  we  consider  the  use  of  film  for  the  matrix  P2  data.  We  detail  elsewhere  [3]  how  to 
optimally  encode  this  matrix  data  on  film.  Figure  2  shows  the  optical  M-V  processor  in  more  detail 
and  Figure  3  shows  its  electronic  support.  In  Figure  3,  the  P1  neurons  are  formed  from  a  2-D  liquid 
crystal  (LC)  display  [4]  modified  with  all  elements  in  a  row  fed  with  the  same  signal.  The  display  has 
20  rows  of  40  elements  (4.2 x  8.4  cm2),  each  LC  pixel  is  2.1  x  2.1  mm2.  We  employ  it  as  a  set  of  20 
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FIGURE  2.  Simplified  view  of  the  optical  matrix-vector  multiplier. 
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stripe  modulators.  The  device  has  TN  LC  material  sandwiched  between  two  glass  plates,  whose 
surfaces  adjacent  to  the  LC  material  are  coated  with  etched  transparent  conductors  to  form 
electrically  isolated  elements.  At  present,  we  use  a  set  of  4x4  LC  pixels  (8.2 x 8.2  mm2)  in  the  center 
of  the  display.  This  stripe  Pi  design  is  attractive  since  it  avoids  the  need  for  external  collimating  and 
imaging  optics  between  PI  and  P2.  It  allows  P2  to  be  placed  in  contact  with  PI,  thus  simplifying 
design.  P2  data  is  recorded  as  a  binary  computer  generated  hologram  (CGH)  pattern  using  a  new 
encoding  technique  that  provides  high  accuracy  [3]. 

A  HeNe  laser  (X  =  633  nm)  serves  as  the  present  light  source  illuminating  the  LC  at  PI .  The  initial 
optics  illuminating  PI  have  been  designed.  We  insure  uniform  illumination  of  PI.  The  beam  radius  R 

required  to  produce  a  uniformity  X  =  0.995  (0.5%)  over  a  diameter  D  =  v/lf(8.4)  =  11.88  mm  is  R  ^ 

D/2(1  -X)1/2  s  84  mm.  This  will  pass  only  0.5%  of  the  light.  This  is  sufficient  with  a  30  mW  laser,  but 
can  be  improved  by  use  of  laser  diode  sources  and  CGHs.  The  initial  P2  to  P3  optics  have  been 
designed  using  two  cylindrical  lenses  (L2  with  f^  s  25  cm  and  L3  with  f ^  =  16  cm). 

At  P3,  we  use  TRW  OP509SLC  phototransistors  with  a  plastic  lens  case.  The  detectors  have  a 
1 .3  mm  diameter  active  area  and  are  on  2.54  mm  centers.  The  light  from  each  2.1  mm  wide  LC  and  P2 
column  is  magnified  by  2.54/2.1  s  1 .21  to  image  onto  the  detectors.  With  1 .3  mm  detectors,  the  width 
of  a  P2  element  can  be  no  larger  than  1 .3/1 .21  =  1.07  mm.  We  use  0.9  mm  wide  P2  elements  to  allow 
an  0.1 7  mm  guard  band  horizontally.  We  use  1 .8  mm  of  the  2.1  mm  height  of  each  element  (an  0.3  mm 
guardband). 

Thus,  each  P2  element  has  an  active  area  of  1. 8x0.9  mm2  (in  the  2.1  x 2.1  mm2  area).  The 
Unotronic8  recorder  we  use  to  produce  the  P2  mask  has  20  jum  diameter  spots  on  1 0  pm  centers. 
Thus,  in  the  1 .8x0.9  «  1 .62  mm2  area  of  one  element,  we  can  record  4050  dots  or  weights  with  4050 
gray  levels  using  dot  corrected  CGH  techniques  [3]. 

We  have  fabricated  the  major  portion  of  the  support  electronics  (Figure  3).  The  vector  driver  (one 
op  amp  and  CMOS  SPOT  switch  per  Pi  input  element)  provides  the  ac  zero-mean  square  wave 
required  by  the  LC.  The  input  is  a  dc  voltage  and  the  output  is  a  zero-mean  squarewave  with  a 
peak-to-peak  amplitude  that  is  twice  the  unipolar  input.  The  circuit  can  operate  at  1  KHz  (this  is  much 
faster  than  the  20  Hz  frame  rate  of  the  present  LC).  We  have  also  fabricated  the  detector  amps 
(transimpedance  op  amps  LF412).  The  gain  of  each  is  individually  adjusted  to  correct  for  variations  in 
channels  and  detector  efficiencies.  The  max  output  is  5.0  volts  at  the  maximum  expected  light  level  (a 
220  kft  feedback  resistor  is  used). 

The  data  acquisition  and  generation  system  is  now  described.  In  this  section,  the  outputs  from 
the  P3  detector  amps  are  A/D  converted,  fed  to  an  IBM  PC/AT  whose  outputs  are  D/A  converted,  fed 
to  a  demultiplexor  S/H  (sample  and  hold)  circuit  before  being  fed  in  parallel  to  the  PI  vector  drivers. 
All  analog  signals  are  0-5  V.  All  circuits  use  +5  and  +7.5  V  power  supplies.  This  system  consists  of 
an  IBM  PC/AT,  a  PC-Bus  data  acquisition  board,  and  special  demux  S/H  circuitry.  The  PC/AT  runs  at 
6  MHz,  has  512  kB  of  memory  and  a  30  MB  hard  disk.  Control  software  is  written  in  Turbo  C  and 
80286  assembly  code.  It  computes  the  Input  drive  to  PI  and  outputs  these  signals.  It  also  digitizes 
the  output  P3  neuron  data.  The  data  acquisition  board  is  a  Data  Translation  DT2821.  It  contains  one 
12-bit  50  kHz  A/D  and  two  independent  12-blt  130  kHz  D/As.  A  16-channel  multiplexor  inputs  the 
parallel  P3  detector  data  to  the  A/D.  The  board  also  contains  1 6 -bits  of  TTL  level  signals  for  later  I/O 
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control  use.  Presently,  we  digitize  4  detector  outputs  in  80  /<s  (50  kHz).  The  calculated  PI  inputs  for 
the  next  cycle  are  sequentially  D/A  converted,  fed  to  a  demultiplexor  S/H  (an  LF398  S/H  circuit)  to 
allow  them  to  be  fed  in  parallel  to  the  Pi  drivers.  Four  parallel  S/H  circuits  are  presently  used.  The 
D/A  can  cycle  through  a  32-element  vector  in  2.5  ms  (7.7  /*s  per  input  or  30.8  ns  for  our  present  4 
inputs). 

The  noise  level  of  the  P3  detectors  were  measured  to  be  1 0  mV  out  of  5  V  for  a  an  =  2x  1 0'3.  We 
measured  other  parameters  of  our  system  for  use  in  simulations  of  error  source  effects.  The  contrast 
ratio  of  the  LC  at  PI  was  measured  to  be  5690:1 .  This  is  much  larger  than  the  value  typically  reported, 
due  to  the  type  of  drive  signal  we  use  in  our  use  of  the  device.  We  drive  each  modulator  element  with 
a  continuous  signal.  Conventionally,  one  applies  a  pulse  voltage  to  one  element  and  then  returns 
(time-multiplexed)  to  it  later.  The  speed  of  the  LC  device  was  measured  by  applying  a  2  Hz  square 
wave  signal  with  various  amplitudes.  The  results  (Figure  4)  show  that  a  50  ms  switching  time 
(average  of  rise  and  fall  times)  results  for  drive  voltages  above  7  V  RMS.  Increasing  the  drive  voltage 
decreases  LC  rise  time  and  increases  fall  time.  All  times  are  measured  from  the  10-90%  points.  We 
linearize  the  LC  transfer  function  by  the  512  point  table  look-up.  The  results  (Figure  5)  show  a  512 
point  linearity  is  obtained. 

4.  OPTICAL  NN  SIMULATION 

We  now  provide  the  first  models  for  the  accuracy  required  in  the  various  elements  of  any  optical 
NN.  We  consider  our  NN  for  our  mixture  NN  application.  We  also  compare  the  pseudoinverse 
solution  and  show  that  a  NN  solution  is  required.  We  consider  4  mixture  case  studies  with  different 
numbers  of  elements  considered  and  with  different  numbers  of  non-zero  elements.  Case  1  (4 
elements,  2  non-zero),  Case  2  (4  elements,  all  non-zero),  Case  3  (8  elements,  4  non-zero),  and  Case 
4  (8  elements,  all  non-zero).  Various  amounts  of  elements  were  added,  with  the  amount  of  each 
varied.  The  elements  used  were  taken  from  the  20  most  common  minerals.  Noise  (zero-mean, 
Gaussian)  was  added  to  the  data.  The  average  of  10  runs  (Cases  3  and  4)  or  20  runs  (Cases  1  and  2) 
were  used  for  each  x#  set  (with  different  noise  realizations).  Each  data  point  also  represents  the 
average  of  1 0  xe  choices.  Thus,  each  data  point  is  the  average  of  1 00  or  200  different  runs. 


4.1  Pseudoinverse  versus  Neural  Net  Solutions 

Wb©n  xe  has  most  values  near  0  or  1  (Cases  1  and  4)  as  occurs  in  all  practical  cases,  we  expect 
the  NN  solution  to  be  better.  Figures  6  and  7  show  the  data  obtained  for  the  4  cases  with  different 
amounts  of  input  noise  (SNR).  We  see  that  the  average  neuron  error  is  much  less  for  ail  NN  cases 
(dashed  lines)  than  for  the  pseudoinverse  (solid  line)  cases  and  that  the  difference  is  more  at  lower 
input  SNR  and  for  cases  with  more  zero-valued  elements  (Cases  1  and  3).  The  number  of  iterations 
required  also  differs.  With  SNR  *  20  dB,  Case  3  required  50,000  iterations  versus  1 000  for  Case  1 
(due  to  the  larger  condition  number).  Mixtures  with  zero-valued  x#  require  more  (1000  versus  130) 
iterations  to  converge.  We  started  the  NN  iterations  from  the  xe  calculated  from  the  pseudoinverse 
solution  (this  is  a  useful  new  technique).  Without  this  starting  xe  value  and  using  an  arbitrary  initial  x, 
we  needed  8000  versus  1 000  iterations. 


We  now  discuss  the  significance  of  an  average  neuron  error  of  0.02  (our  acceptable  level).  For 
Case  1,  with  2  non-zero  xft,  the  average  xe  *  0.50  and  an  average  error  of  0.02  is  an  0.02/0.50  *  4% 
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Figure  4.  Rise  (solid),  fall  (dashed),  and 
switching  time  (dotted)  vs.  voltage. 


Figure  5.  Transmission  vs.  digital  output 
with  look-up  table. 


Figure  6.  Accuracy  vs.  noise  for  pseudoinverse  (solid  line)  and  neural  net  (dotted  line). 


Spectrum  SNR,  dB  Spectrum  SNR,  dB 

Figure  7.  Accuracy  vs.  noise  for  pseudoinverse  (solid  line)  and  neural  net  (dotted  line). 
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error.  For  Case  3,  the  average  xg  =  0.25  and  0.02  is  an  average  error  of  0.02/0.25  =  8%.  In  practice, 
we  expect  few  non-zero  elements  and  thus  Cases  1  and  3  are  the  most  representative  ones.  They 
have  a  lower  (better)  average  error  (Figures  6  and  7).  The  choice  of  0.02  is  also  a  good  goal  since 
imaging  spectrometer  calibration  accuracy  is  5-7%. 

4.2  Simulation  Model 


We  model  the  NN  in  Eq.  (5)  with  errors  as 


x(t+1 )  =  *[x(t)  +  »?[a  +  Uq  +  r^  +  nx(t)  +  +  KTK)x(t)].  (8) 

We  now  discuss  the  errors  included  in  (8).  The  detector  P3  noise  is  additive,  zero-mean,  Gaussian 
and  has  a  standard  deviation  nx(t)  that  is  time-varying  and  uncorrelated  (a  new  noise  realization  is 
used  for  each  iteration).  The  additive  P3  offset  in  the  detectors  (dark  current)  and  detector 
differential  amps  plus  A/D  quantization  noise  are  modeled  by  r^.  The  P3  neuron  gain  variations  are 
modeled  by  N^.  They  are  multiplicative  and  signal  dependent.  We  assume  negligible  PI  errors 
(Figure  5  confirms  this,  since  P3  feeds  PI ).  To  include  Nfl  errors  at  Pi,  we  would  multiply  the  entire 
right  hand  side  of  (8)  by  another  factor.  The  P3  neuron  errors  equivalently  also  include  the 
effect  of  ^  errors  in  PI  neurons,  thus  we  do  not  add  the  extra  factor.  We  represent  neuron  gain 
errors  by  a  diagonal  matrix  with  diagonal  elements  1  +n  (where  n^  are  the  gain  variations).  This 
handles  a  multiplicative  error  (1  +  error)  times  a  vector.  Errors  In  the  connection  matrix  at  P2  are 
represented  by  the  additive  matrix  of  random  uniformly  distributed  values  added  to  T.  This  also 
includes  errors  in  the  uniformity  of  the  light  incident  on  Pi.  P3  offsets  can  be  reduced  by  adjusting 
the  P3  output  amplifiers.  Detector  and  PI  gain  variations  can  also  be  adjusted  by  varying  individual 
amp  and  drive  circuits.  PI  offset  and  nonuniform  Input  light  effects  can  be  corrected  within  the  P2 
mask.  Thus,  all  errors  we  consider  are  residual.  Our  goal  is  to  determine  the  dominant  errors,  the 
level  to  which  each  must  be  reduced,  how  multiple  errors  combine  and  the  performance  expected  for 
a  given  set  of  components  with  given  specifications. 

In  all  cases,  we  used  a  fixed  convergence  threshold  of  10'4  (i.e.  the  largest  element  in  a-T  x  must 
be  <  10'4)  to  stop  iterations.  When  the  noise  added  is  above  10'4,  we  average  the  last  20  correction 

vectors  and  stop  iterating  when  the  average  is  less  than  or  equal  to  3.25n/\/20 .  This  level 

increases  with  the  standard  deviation  n  of  the  noise  source. 


4.3  Error  Source  Results 


Figures  8-11  show  the  effects  of  the  four  error  sources  separately.  From  Figure  9,  we  see  that 
the  effect  of  the  additive  detector  noise  is  negligible,  i.e.  the  average  error  is  less  than  the  standard 
deviation  of  the  noise.  The  expected  detector  noise  measured  was  <rx  «  2x1  O’3  and  as  seen,  the 

average  neuron  error  at  this  <rK  is  much  less  than  10'3  and  hence  is  negligible.  Figure  9  shows  the 
effect  of  offset  error  r^.  To  achieve  an  average  error  of  0.02,  we  require  an  offset  variation  of 
+0.0025  (for  the  4  neuron  cases)  and  +0.0013  (for  the  8  neuron  Case  3).  With  the  12-bit  D/A  and 
A/D,  we  expect  a  quantization  error  of +1.2  x  1 0'4.  The  detector  op  amps  have  2  mV  offset  out  of  5  V 
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or  +4X1  O'4.  The  expected  total  r^  is  thus  (2.4+8) xl  O'4  =*  10'3.  At  these  levels  (10‘3),  the 
average  error  is  negligible  (1 .4x  10'4  for  case  4). 

The  effect  of  N  errors  is  more  significant  (Figure  10).  To  keep  the  error  below  0.02,  we  require 
— -0 

the  gain  to  be  uniform  within  1.25X10’3  (Case  2)  or  5.5x10'4  (Case  3).  For  our  lab  system,  the 
detector  gains  can  be  matched  to  1 0'3  and  the  detector  amps  are  matched  to  1 0‘4.  The  additive  Nm 
matrix  errors  are  the  most  dominant  errors  (Figure  11).  This  is  expected  since  it  alters  the  problem 
and  the  energy  surface  and  the  required  matrix  accuracy  increases  with  the  condition  number  of  the 
matrix  T.  We  require  an  accuracy  of  +1.8x  10'4  (Nm  =  3.6  x  10  4)  for  Case  1  and  +10  4  (N^  =  2x10  4) 

for  Case  3.  We  can  record  matrix  elements  with  an  accuracy  =  1/4050  =  2.5x10  4  and  thus 
expect  acceptable  results.  A  beam  uniformity  of  0.005  incident  on  PI  will  yield  unacceptable  0.1 
average  errors.  To  achieve  this  amount  of  uniformity,  we  must  correct  with  the  P2  mask  to  achieve 
acceptable  results  in  all  cases. 

We  combined  all  noise  sources  and  found  that  they  add  in  an  RMS  fashion  and  that  the  mask 
accuracy  and  PI  beam  uniformity  are  the  critical  parameters. 
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Optical  neural  net  for  classifying  imaging  spectrometer 
data 

Etienne  Barnard  and  David  P.  Casasent 


A  problem  in  surface  mineralogy  is  addressed;  namely,  how  does  one  determine  the  composition  of  a  mixture 
from  its  spectrum?  A  neural  net  algorithm  arises  naturally,  and  we  detail  the  state  equations  of  this  net.  An 
optical  architecture  and  simulation  results  are  presented. 


I.  Introduction 

We  consider  the  following  problem:  Given  the  spec¬ 
tra  of  a  number  of  elements,  determine  the  composi¬ 
tion  of  an  unknown  input  mixture  from  its  measured 
spectrum.  This  problem  has  been  studied  in  the  con¬ 
text  of  surface  mineralogy,1  and  conventional  digital 
algorithms  for  solving  it  have  been  proposed.2  These 
algorithms  are  fairly  slow,  since  serial  computation  is 
used.  Neural  nets3  4  are  ideally  suited  for  applications 
such  as  this  one,  because  of  the  high  parallelism 
achievable  with  them  (as  we  will  show).  Thus,  we 
sought  to  express  the  determination  of  the  composi¬ 
tions  as  a  problem  suitable  for  solution  by  neural  nets; 
this  was  done  using  the  Hopfield  minimization  proce¬ 
dure.  It  is  preferable  that  one  implement  neural  nets 
with  hardware  capable  of  achieving  their  high  degree 
of  connectivity.  Therefore,  many  researchers  have 
investigated  optical  implementations  of  these  nets.5 
We  also  consider  an  optical  architecture  to  implement 
our  algorithm. 

In  Sec.  II,  a  mathematical  description  of  the  problem 
is  developed.  This  is  utilized  as  a  basis  for  a  neural 
network  solution  and  an  optical  implementation,  de¬ 
scribed  in  Sec.  III.  In  Sec.  Ill  we  also  consider  an 
alternative  approach  to  our  neural  algorithm.  Initial 
simulation  results  are  presented  in  Sec.  IV,  and  Sec.  V 
summarizes  our  results. 
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II.  Mathematical  Description 

Let  Ke(\)  denote  the  spectrum  of  mineral  e,  where  X 
denotes  wavelength.  There  are  E  such  minerals,  so 
that  e  ranges  from  1  to  E.  We  shall  discretize  the 
spectra,  so  that  they  are  measured  at  the  N  wave¬ 
lengths  X„,  n  =  1,. .  .N.  Then  each  possible  mineral  is 
described  by  a  vector  k*  =  (K\,KC2,  ■  ■  ■  J(%),  where  Ken  = 
Ke(\n).  Similarly,  the  spectrum  of  the  input  mixture 
is  described  by  a  vector  c  =  (ci,C2, . . .  ,Cn)-  Our  objec¬ 
tive  is  to  decompose  the  unknown  input  mixture  into 
known  elements  (i.e.,  to  determine  the  fractional 
amount  xe  of  each  basic  mineral  present).  This  is 
given  by  the  vector  x  =  (xi,X2> . .  •  ,*e),  where  0  <  xe  <  1 
and  2f=I  xe  =  1.  For  the  mixture  described  by  x,  the 
spectral  response  at  wavelength  n  is  2C  xeKen.  The 
difference  between  this  vector  and  the  measured  spec¬ 
tral  vector  is  a  measure  of  how  well  x  describes  the 
input  mixture.  One  can  form  a  variety  of  scalar  mea¬ 
sures  from  this.  The  simplest  such  scalar  measure  is 
the  Euclidean  distance.  As  we  shall  see  below,  this  is 
also  the  correct  measure  to  use  from  probabilistic  con¬ 
siderations.  We  thus  determine  how  well  a  mixture 
vector  x  describes  an  input  spectrum  by  considering 
the  error  measure 

(1) 

This  error  is  zero  if  the  composition  vector  x  de¬ 
scribes  the  measured  spectrum  exactly.  Otherwise,  it 
is  greater  than  zero.  To  determine  the  composition  of 
an  unknown  input  mixture,  we  must  minimize  this 
error  with  respect  to  x.  This  ensures  that  the  best 
possible  match  between  the  measured  and  predicted 
spectra  is  obtained.  This  minimization  procedure  can 
be  viewed  as  a  maximum-likelihood  determination  of 
the  composition  of  the  mixture,  if  we  assume  that  the 
difference  between  the  measured  spectrum  and  the 
actual  spectrum  (as  determined  by  the  composition)  is 
due  to  normally  distributed  zero-mean  noise.  Then 
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the  maximum-likelihood  estimation  of  the  composi¬ 
tion  of  the  mixture  is  given6  by  the  minimum  of  Eq.  ( 1 ). 

III.  Neural  Network  Description 

This  problem  is  not  amenable  to  a  conventional 
pattern-recognition  solution,  because  there  are  no  pre¬ 
determined  classes  into  which  we  can  classify  the  input 
spectral  data.  That  is,  any  combination  of  minerals 
has  to  be  considered.  On  the  other  hand,  the  mathe¬ 
matical  description  derived  in  Sec.  II  is  suitable  for  a 
neural  network  implementation.  The  application  of 
neural  nets  to  minimization  problems  is  well  estab¬ 
lished.7 

The  general  procedure  is  to  describe  the  minimiza¬ 
tion  problem  as  the  minimization  of  an  energy  function 
E,  which  depends  on  a  set  of  variables  xj,  where  j  = 
1, . . .  J.  There  should  be  no  difficulty  in  distinguish¬ 
ing  between  the  energy  and  the  number  of  elements; 
thus  we  retain  the  same  symbol  E  for  both.  The 
objective  is  to  find  the  set  \xj\  that  minimizes  E.  A  set 
of  neurons  is  employed,  with  one  neuron  representing 
each  of  the  variables  x,.  To  minimize  E,  we  introduce 
a  discrete  time  variable  t,  and  design  a  neural  network 
that  evolves  the  neuron  activities  in  time  according  to 

SE 

X:(t  +  1)  =  x  (t)  -  ri  ■  .  (2) 

'  '  dxf,t) 

where  ij  is  a  parameter  that  controls  the  speed  of  con¬ 
vergence.  It  is  easy  to  show  that  E  in  Eq.  (2)  decreases 
as  time  progresses,7  and  that  the  net  reaches  a  stable 
steady  state  when  E  attains  its  minimum  value,  since 
only  then  is  no  further  decrease  possible. 

In  our  imaging  spectrometer  application,  the  mini¬ 
mization  variables  are  the  composition  fractions  xe, 
and  the  energy  function  is  a  modified  version  of  the 
error  defined  in  Eq.  (1).  This  modification  is  neces¬ 
sary  because  nothing  in  Eq.  (I)  forces  the  sum  of  the 
fractions  to  one.  This  constraint  is  enforced  separate¬ 
ly  in  the  usual  way  be  adding  a  positive  semidefinite 
term  to  Eq.  (1).  This  term  attains  its  minimum  value 
of  zero  when  the  fractions  xe  add  to  one.  The  simplest 
term  to  add  is  A(2e  xe  -  l)2,  where  A  is  a  positive 
constant.  This  is  minimized  when  the  sum  of  the 
fractions  x#  equals  one.  Thus,  the  energy  function  for 
our  application  is 

E  -  (I)  {?  («:  -  S  + A  (?  '• " l)’} 


The  factor  of  1/2  is  included  for  later  convenience  (the 
problem  remains  unchanged  if  the  whole  energy  func¬ 
tion  is  scaled  by  a  constant  factor).  The  constant  A 
weighs  the  relative  importance  of  the  two  energy  terms 
in  Eq.  (3);  it  must  be  chosen  large  enough  that  2e  x,.  «=  1 
for  all  states  with  low  energy.  Inserting  Eq.  (3)  into 
Eq.  (2),  the  evolution  equation  for  our  neurons  be¬ 
comes 


x{t  +  1)  =  Xf(t)  +  J) 


(4) 


where  the  indices  e  and  f  refer  to  different  neurons. 

To  rewrite  this  equation  in  matrix-vector  notation, 
we  note  that  the  set  (2„  KenKfn |  forms  an  E  X  E  matrix 
with  horizontal  index  e  and  vertical  index  /.  Its  entries 
are  the  vector  inner  products  of  the  spectra  for  miner¬ 
als  e  and  f.  Similarly,  the  matrix-vector  product  2„ 
is  a  vector  with  E  entries.  Using  matrix-vector 
notation,  Eq.  (4)  can  be  written  as 


x(t  +  X)  =  x(t)  -  TxU)  +  a,  (5) 

where  the  neuron  states  are  now  a  vector  x.  The 
interconnection  matrix  T  has  elements 

and  is  formed  by  adding  A  to  every  element  of  the 
matrix  {2„  KenKfn |  and  multiplying  by  ij.  The  summa¬ 
tion  over  e  in  Eq.  (4)  is  achieved  by  the  matrix-vector 
product  in  Eq.  (5).  The  vector  a  has  components 


“/=  v 


To  obtain  this  from  the  vector  2„  Kfnc„,  we  add  A  to 
every  component  and  multiply  the  resulting  vector  by 


V- 

One  point  has  been  neglected  so  far:  nowhere  in  our 
neural  description  have  the  fractions  xe  been  forced  to 
lie  in  the  range  [0,1].  Even  though  we  minimize  E 
when  2e  xc  =  1,  we  need  to  ensure  that  each  neuron 
activity  xe  is  positive  and  less  than  one,  since  it  repre¬ 
sents  a  fraction.  This  constraint  is  enforced  by  apply¬ 
ing  a  nonlinear  operator  <t>  to  Eq.  (4),  where 


$[yl 


ytory  E  10,1] 

0  for  y  <  0 
.1  for  y>  1. 


(8) 


Figure  1  shows  the  value  of  this  nonlinear  operator  as  a 
function  of  its  input  y.  If  a  neuron  change  would 
increase  xe  beyond  one,  we  set  x,  =  1;  similarly,  if  x, 
would  decrease  to  a  value  less  than  zero,  x,  is  set  equal 
to  zero.  Thus  the  neuron  evolution  equation  becomes 


x/r  +  i) 


+  a 


(9) 


Thus,  the  steps  involved  in  updating  x(t)  [to  obtain  x(t 
+  1)]  are: 

1.  calculate  the  matrix-vector  product  Tx(t), 

2.  subtract  this  from  a, 
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3.  add  the  resulting  vector  to  the  previous  neuron 
state  vector  x,  and 

4.  threshold  as  prescribed  by  the  function  <j>. 

An  optical  implementation  of  Eq.  (9)  is  shown  in  Fig. 
2,  where  P1-P3  is  a  standard  optical  matrix-vector 
multiplier,  which  multiplies  the  matrix  Mi  atP 2  by  the 
vector  at  Pi.  The  matrix  Mi  is  the  matrix  T.  It  is 
fixed  by  the  spectral  properties  of  the  E  known  miner¬ 
als,  and  can  be  recorded  on  a  Film  mask  (Mi  in  Fig.  2). 
Since  A  and  the  spectral  measurements  Ken  are  greater 
than  or  equal  to  zero,  all  the  entries  in  T  are  positive. 
The  intensity  of  LED  e  in  array  LED]  is  proportional 
to  xe(t).  The  light  from  the  LEDs  in  P 1  is  expanded 
horizontally  by  lens  L\,  so  that  each  LED  illuminates  a 
row  of  Mi.  Lens  L2  integrates  (vertically)  the  light 
leaving  the  columns  of  M\. 

The  vector  a  depends  on  the  measured  spectrum  of 
the  input  mixture  to  be  analyzed,  but  does  not  change 
from  one  iteration  to  the  next.  It  consists  of  two  parts 
as  in  Eq.  (7).  The  first  term,  a  matrix-vector  product, 
is  calculated  on  the  P4  to  Pg  optical  system.  The 
second  term,  a  constant  bias,  is  added  after  the  detec¬ 
tor  array  (Det)  at  P$.  The  matrix  at  P5  is  also  fixed  (it 
is  EX  N,  and  consists  of  the  spectra  of  the  E  minerals). 
Thus  A/2  can  be  a  film  mask,  while  the  input  to  the 
LED  array  LED2  is  proportional  to  the  spectral  sam¬ 
ples  of  the  measured  mixture  to  be  analyzed.  The 
outputs  from  the  two  linear  detector  arrays  at  P:!  and 
P 6  are  subtracted  in  electronics  to  form  a  —  Tx.  Alter¬ 
natively,  the  vector  a  can  be  calculated  electronically, 
since  it  does  not  change  between  iterations  (it  only 
changes  when  a  new  spectrum  is  input). 

The  previous  neuron  state  x(f)  must  be  added  to  the 
optically  calculated  vector  a  -  Tx.  This  can  be  done 
on  the  output  detectors  with  electronics.  It  can  also  be 
achieved  by  using  I  -  T  for  M 1  which  now  requires 
negative  number  encoding,  such  as  space  multiplexing, 
and  the  same  external  detector  electronics.  The  out¬ 
puts  of  the  operational  amplifiers  are  limited  to  lie 
between  0  and  1  (in  the  appropriate  units).  If  the  net 
has  not  converged,  this  vector  is  fed  back  to  LED]  for 
the  next  iteration.  The  arrangement  (without  nega¬ 
tive  number  encoding  and  with  a  bias  yA  added  elec¬ 
tronically  to  the  amplifiers)  is  attractive  because  it 
allows  us  to  control  y  by  controlling  the  gain  of  the 
LEDs  in  Pi  and  P4  and  the  output  bias.  Adapting  y 
during  the  iteration  process  is  useful  since  it  allows  us 
to  speed  up  the  convergence  of  the  neural  net  as  is 
explained  in  Sec.  IV. 

One  obvious  alternative  to  our  neural  algorithm  has 
to  be  considered.  Since  we  express  our  problem  as  the 
minimization  of  a  quadratic  energy  function,  the  fol¬ 
lowing  is  a  plausible  alternative  method:  Write  the 
quadratic  energy  function  as 

E  =  x'Ax  +  c'x.  (10) 


Fractional  waiQht 
solutions  i# 


Spoctrum  o! 
input  mixtura 


Fig.  2.  Optical  neural  net  architecture  for  the  minimization  proce¬ 
dure. 

(It  is  assumed,  without  loss  of  generality,  that  A  is 
symmetric).  The  problem  with  this  approach  is  that 
the  solutions  do  not  satisfy  the  constraint  that  the  xe 
should  lie  in  the  range  [0,1].  (The  constraint  that  the 
fractions  should  sum  to  1  can  be  enforced  by  rescaling 
the  result  obtained.)  No  simple  analytic  way  of  en¬ 
forcing  the  range  constraint  exists;  therefore  our  cur¬ 
rent  method  is  preferred. 

IV.  Simulation  Results 

The  neural  net  of  Fig.  2  and  the  algorithm  in  Eq.  (9) 
were  simulated  for  the  case  of  N  =  826  wavelength 
samples  and  E  -  10  elements.  The  minerals  used  were 
ten  of  the  most  common  minerals  (Table  I).  Their 
reflectance  spectra  contained  samples  at  1-nm  inter¬ 
vals  from  400-799  nm,  and  samples  at  4-nm  intervals 
from  800-1500  nm.  The  measurements  were  supplied 
by  the  Jet  Propulsion  Laboratory.  A  few  examples  of 
the  reflectance  spectra  used  (indicating  the  percentage 
of  light  that  is  reflected  at  the  specified  wavelength) 
are  shown  in  Fig.  3.  The  mixture  included  in  Fig.  3 
consisted  of  50%  montmorillonite  and  50%  dolomite. 
The  spectra  of  the  mixtures  were  generated  from  the 
spectra  of  the  minerals,  using  the  linear  model  de¬ 
scribed  in  Sec.  II.  We  also  used  spectra  for  different 
grain  sizes  of  the  minerals,  since  the  grain  size  affects 
the  reflectance  spectrum.  We  refer  to  these  as  large 
(>125  pm)  and  small  (<45  /nm)  grain  sizes. 

_ Table  I.  end  Mixtures  UMd  In  the  Simula!  Ion* _ 

_ Common  minerals  used _ 

Kaolinite  Illite 

Alunite  Jarosite 

Gypsum  Chalcedony  (a  quartz) 

Montmorillonite  Chlorite 

Calcite  Dolomite 


To  obtain  the  minimum  of  E  analytically,  differentiate 
(10)  with  respect  to  x  and  set  the  result  equal  to  zero. 
This  gives 


X  =  -('/.lA  V 


ill) 


_ Mixtures  used 

Dolomite/montmorillonite 

Gypsum/dolomite/calcite 

(Various  percentage  compositions  of  both) 
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Pig.  3.  Typical  spectra  of  four  minerals,  and  a  mixture  of  two  of  the 
minerals. 


Three  sets  of  experiments  were  performed.  In  the 
first  set,  we  investigated  the  ability  of  the  neural  net  to 
determine  the  identity  of  an  input  spectrum  that  was 
one  of  the  ten  minerals  (with  only  large,  only  small,  and 
with  both  small  and  large  grain  sizes  used).  In  the  first 
case,  the  spectra  Ken  used  to  calculate  the  connection 
matrix  Tef  in  Eq.  (6)  belonged  to  samples  with  large 
grain  sizes,  whereas  all  the  spectra  used  in  the  second 
case  were  obtained  from  small  grain  sizes,  and  the  third 
case  used  some  samples  with  large  grain  sizes  and 
others  with  small  grain  sizes.  Each  of  these  3  tests  in 
the  first  set  of  experiments  consisted  of  100  runs.  A 
different  1  of  the  10  original  spectra  ke  was  used  as 
input  spectrum  c  for  10  of  the  runs.  The  10  runs  with 
the  same  input  spectrum  differed  from  one  another 
only  in  the  different  initial  neuron  conditions  that 
were  used,  as  we  describe  below.  The  inputs  in  this 
first  set  of  experiments  are  referred  to  as  pure  inputs 
(i.e.,  only  one  mineral  was  present). 

The  second  set  of  experiments  investigated  the  abili¬ 
ty  of  our  algorithm  to  recognize  the  spectra  of  the  pure 
large  grain-size  minerals  in  the  presence  of  noise. 
Each  spectral  measurement  was  perturbed  by  n%  of 
noise.  This  was  achieved  by  adding  a  uniformly  dis¬ 
tributed  random  number  to  each  measurement.  This 
produced  a  random  variation  in  the  value  of  the  reflec¬ 
tance  by  at  most  n%  of  its  original  value  at  each  X„. 
Values  used  for  n  were  2.5%  and  5%  with  100  experi¬ 
mental  runs  executed  for  each  noise  value,  as  above. 

Finally,  we  studied  the  performance  of  the  neural 
net  when  mixtures  were  used  as  input  without  noise 
and  with  the  same  two  nonzero  noise  levels  used  above 
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present  (2.5%  and  5%).  For  each  noise  level.  10  differ¬ 
ent  mixtures  were  used  as  input.  These  mixtures  con¬ 
sisted  of  different  fractional  compositions  of  the  ele¬ 
ments  of  the  2  mixtures  listed  in  Table  I.  For  each 
noise  level,  100  runs  were  again  performed  (10  input 
mixture  spectra  times  10  initial  neuron  conditions). 

For  each  experimental  run,  we  proceeded  as  follow: 
the  10  X  10  connection  matrix  T  in  Eq.  (6)  was  calculat¬ 
ed  using  the  10  spectra  of  the  pure  minerals,  as  de¬ 
scribed  above.  We  set  A  =  1.0  to  weigh  the  two  terms 
in  Eq.  (3)  equally,  since  both  terms  are  of  approximate¬ 
ly  equal  importance  to  us.  Ten  neurons  were  used  in 
each  experiment,  because  10  minerals  were  used.  Ini¬ 
tially,  a  random  number  uniformly  distributed  be¬ 
tween  0  and  2/E  =  0.2  was  assigned  to  each  neuron. 
This  assures  that  the  expected  value  of  the  sum  of  the 
neuron  activities  equals  1,  as  it  should  be  because  the 
neuron  activities  represent  fractional  compositions. 
The  10-element  vector  a  was  then  calculated  using  Eq. 
(7).  This  vector  depends  on  the  input  spectrum  to  be 
identified.  The  step  size  parameter  ij  was  initially 
chosen  equal  to  l/Trace(T),  since  this  is  the  largest 
value  of  ij  which  is  certain  to  lead  to  convergence.  As 
the  net  converged  to  its  stable  state,  convergence  was 
accelerated  (to  reduce  the  number  of  iterations)  by 
doubling  rj  whenever  the  sum  of  the  squares  of  the 
differences  of  the  neuron  activities  at  successive  time 
steps  decreased  by  a  factor  of  4.  Since  the  change  in 
neuron  activity  x(t  +  1)  -  x(t)  is  linear  in  ij  from  Eq. 
(4),  the  sum  of  the  squares  of  this  difference  will  be 
proportional  to  j?2.  Thus,  ij  should  be  scaled  in  propor¬ 
tion  to  the  square  root  of  the  sum-of-squares.  Chang¬ 
ing  ij  when  the  sum  of  the  squared  differences  was  a 
factor  of  four  was  used,  since  this  choice  gave  good 
performance.  Whenever  the  errors  increased  from 
one  iteration  to  the  next,  we  divided  ij  by  1.5,  where  the 
1.5  empirical  factor  has  been  found  to  be  suitable  for  a 
large  range  of  data.  This  assures  that  the  system  will 
remain  stable  by  reducing  ij  when  it  becomes  too  large 
for  stable  convergence. 

The  net  was  iterated  according  to  Eq.  (9)  until  it 
converged  (that  is,  until  no  neuron  activity  xe  changed 
by  more  than  a  prespecified  amount).  For  the  pure 
inputs,  the  terminating  tolerance  was  always  chosen  to 
be  10-4,  since  this  leads  to  reasonable  accuracy  (<6%) 
in  most  cases,  without  requiring  an  excessive  number 
of  iterations.  For  the  mixture  inputs  a  smaller  termi¬ 
nating  tolerance  (10~5)  was  required,  because  the  cor¬ 
rect  answers  are  now  not  simply  zero  or  one.  These 
tolerances  mean  that  the  difference  in  the  value  calcu¬ 
lated  for  any  element  xe  on  successive  iterations  t  and  t 
+  1  was  less  than  10~4  or  10~5. 

Our  stopping  criterion  used  small  differences  in  the 
neuron  states  x  between  iterations.  These  are  used  to 
determine  digitally  when  we  enter  the  minimum-£ 
region  of  the  E  vs  x  curve.  With  a  lower  processor 
accuracy  (such  as  we  would  expect  with  an  analog 
optical  neural  net),  the  processor  accuracy  is  also  the 
accuracy  to  which  we  can  calculate  each  of  the  x- 
values.  We  should  thus  be  able  to  obtain  (for  a  1% 
accurate  processor)  a  final  x-state  within  1%  of  the 
energy  minimum.  This  issue  merits  further  research. 


For  the  specific  imaging  spectrometer  least-squares 
problem,  monitoring  the  change  in  E  rather  than  the 
change  in  x  between  iterations  would  also  provide  a 
useful  stopping  criterion.  Our  choice  is  more  general, 
however,  since  it  will  provide  a  useful  stopping  criteri¬ 
on  even  if  the  energy  magnitude  of  the  best  possible 
solution  is  not  known. 

Table  II  shows  the  results  obtained.  Column  1  de¬ 
scribes  the  type  of  spectra  Ken  used  for  a  given  run. 
The  first  three  tests  (experimental  set  1)  used  one 
input  mineral  with  no  noise  present.  The  next  two 
tests  also  used  one  input  mineral,  but  with  noise  added 
(set  2).  The  final  three  tests  involved  mixtures  as 
input,  with  and  without  noise  (set  3).  Column  2  lists 
the  percentage  of  noise  n  by  which  the  spectra  were 
perturbed.  Column  3  lists  the  number  of  iterations 
required  to  reach  the  stopping  criterion  in  each  case, 
and  column  4  gives  the  precision  of  the  result  (the 
largest  amount  by  which  any  stable  neuron  state  dif¬ 
fered  from  the  true  compositional  fraction).  Note 
that  this  is  a  worst  case  precision  error. 

We  see  that  the  neural  net  was  successful  in  classify¬ 
ing  the  input  spectra  to  reasonable  accuracy.  The 
worst  performance  occurred  when  small  grain  sizes 
were  used.  This  occurred  because  the  shape  of  the 
spectra  of  several  of  the  minerals  were  very  similar 
(see,  for  example,  the  spectra  of  illite  and  chlorite  in 
Fig.  3).  When  the  spectrum  of  only  one  of  them  (illite) 
was  present,  the  net  converged  to  a  state  containing  a 
mixture  of  these  two  minerals  (see  test  2  in  Table  II). 
In  this  case,  we  decreased  the  terminating  tolerance 
and  found  that  after  5000  iterations  the  net  was  still 
slowly  converging  towards  the  correct  solution.  For 
such  minerals,  it  is  preferable  initially  to  classify  both 
into  one  class  and  then  use  postprocessing  (e.g.,  using  a 
smaller  neural  net  with  only  those  minerals  detected  in 
the  first  pass  represented  by  neurons)  to  determine 
which  was  actually  present.  Alternatively,  a  represen¬ 
tation  more  sensitive  to  the  fine  structure  of  the  spec¬ 
tra  (such  as  the  derivative  of  the  spectrum  with  respect 
to  wavelength  or  the  use  of  only  a  few  wavelengths)  can 
be  used  to  discriminate  such  similar  spectra.  In  all 
other  cases  the  maximum  error  averaged  over  100  runs 
per  test  was  less  than  5.5%.  In  the  mixture  results 
better  precision  was  obtained  at  the  cost  of  an  in¬ 
creased  number  of  iterations  by  decreasing  the  termi- 


Tabto  II.  Simulation  Results  lor  tha  Oatermtnatlon  of  tha  Composition  of 
an  Inpul  Clamant  or  Mixture  _ 


Input  spectra 

Noise 
percentage 
n  1%) 

Average 
number  of 
iterations 

Worst  case 
error  in  any 
i,  (%) 

Pure,  large 

0 

738 

3.39 

Pure,  small 

0 

1094 

11.33 

Pure,  large 
and  small 

0 

940 

5.42 

Pure,  large 

2.5 

1155 

5.16 

Pure,  large 

5 

1153 

4.91 

Mixture,  large 

2236 

0.98 

Mixture,  large 

2.5 

3603 

1.46 

Mixture,  large 

5 

3537 

1.86 

nating  tolerance.  The  results  in  Table  II  also  indicate 
good  performance  in  the  presence  of  additive  noise. 
When  2.5%  noise  was  added  to  the  spectra,  no  notice¬ 
able  degradation  in  the  precision  occurred,  but  the  net 
required  more  iterations  to  converge.  Increasing  the 
noise  to  5%  did  not  appreciably  affect  the  performance 
of  the  net. 

V.  Summary  and  Conclusion 

A  new  optical  neural  net  architecture  and  algorithm 
were  introduced  to  find  the  composition  of  a  mixture 
given  its  spectrum  and  the  spectrum  of  the  possible 
minerals.  A  quadratic  cost  function  is  minimized  to 
find  the  optimal  composition.  This  includes  the  con¬ 
straint  that  all  fractions  sum  to  unity.  The  constraint 
that  all  compositional  fractions  lie  between  zero  and 
one  is  enforced  by  using  neurons  with  a  nonlinear 
transfer  function. 

Simulation  results  were  presented  that  indicate  that 
the  neural  net  algorithm  performs  satisfactorily.  Dif¬ 
ficulties  are  due  to  input  spectra  that  are  very  similar 
in  shape.  Techniques  to  overcome  this  were  ad¬ 
dressed.  In  general,  the  number  of  iterations  required 
for  the  net  to  converge  was  large.  This  indicates  that 
serial  simulations  of  the  net  are  not  realistic  for  large 
applications,  and  emphasizes  the  necessity  of  using  a 
parallel  system  such  as  our  optical  architecture. 
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CHAPTER  5 

"Multitarget  Tracking  with  Cubic  Energy  Optical 

Neural  Nets" 


Multitarget  tracking  with  cubic  energy  optical  neural  nets 

Etienne  Barnard  and  David  P.  Casasent 


A  neural  net  processor  and  its  optical  realization  are  described  for  a  multitarget  tracking  application.  A  cubic 
energy  function  results  and  a  new  optical  neural  processor  is  required.  Initial  simulation  data  are  presented. 


I.  Introduction 

Considerable  interest  currently  exists  in  neural  net¬ 
works1-2  due  to  their  adaptive  properties,  fault  toler¬ 
ance,  and  high  computational  throughput.  One  can 
distinguish  current  neural  processors  by  whether  they 
concern  pattern  recognition  and  associative  memo¬ 
ries3-5  or  multivariate  optimization.6-7  Our  concern  is 
with  the  application  of  neural  networks  in  optimiza¬ 
tion  problems.  As  a  specific  case  study,  we  consider 
multitarget  tracking. 

In  Sec.  II,  we  briefly  review  the  evolution  equations 
as  used  in  neural  minimization.  Section  III  contains  a 
definition  of  the  specific  problem  we  consider,  and  Sec. 
IV  is  a  formulation  of  the  constraints  in  our  multitar¬ 
get  tracking  problem  as  an  energy  function  to  be  mini¬ 
mized.  New  optical  architectures  for  the  implementa¬ 
tion  of  the  equations  in  Sec.  IV  are  then  described  (Sec. 
V).  Simulation  results  are  presented  in  Sec.  VI.  Our 
work  contains  three  new  ideas:  the  application  of  the 
Hopfield  model  to  a  multitarget  tracking  problem;  the 
use  of  a  nonquadratic  energy  function  in  the  minimiza¬ 
tion  problem  and  an  optical  architecture  which  can 
calculate  the  evolution  of  a  system  with  such  a  nonqua¬ 
dratic  energy  function. 

I.  Neural  Model 

We  use  the  Hopfield  model6  as  a  minimization  net¬ 
work.  We  represent  the  state  of  the  neurons  by  X,(f ), 
where  t  is  a  time  variable  and  t  labels  a  particular 
neuron  within  the  set  of  neurons.  For  an  optimization 
problem,  we  wish  to  find  the  set  of  X,  that  minimizes 
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an  energy  function  E,  which  is  a  function  of  the  neural 
activities  X,.  In  this  model,  the  evolution  of  the  activi¬ 
ty  of  each  neuron  (its  rate  of  change  with  time)  is 
described  by 


dt  dX, 


(1) 


The  time  evolution  of  the  energy  function  is 


dE  =  y  9E  dX, 
dt  2,  ax,  dt 

I 


(2) 


To  show  that  the  model  in  Eq.  (1)  minimizes  E ,  we 
substitute  Eq.  (1)  into  Eq.  (2)  and  obtain 


dE  =  _sr  (dEV 

dt  ' 


(3) 


Equation  (3)  shows  that  £  is  a  decreasing  function  of 
time  t.  The  energy  E  will  converge  to  a  local  minimum 
as  t  progresses.  Thus,  the  set  of  neural  activities  |X,j 
in  the  final  stationary  state  describes  a  minimum  ener¬ 
gy  state  of  the  system. 

In  our  work,  this  basic  algorithm  is  modified  by 
using  discrete  time8  and  by  employing  binary  neuron 
activities  X,.  With  binary  X„  the  neuron  activity 
(neuron  state)  in  Eq.  (1)  can  now  be  replaced  by 


X, 


i  rdE 

UtWi 


0  if 


dE 

ax, 


<o. 

>o, 


(4) 


that  is,  the  state  of  neuron  i  is  binary  and  depends  on 
the  energy  as  noted.  The  choice  in  Eq.  (4)  insures  that 
the  energy  function  is  approximately  minimized  in  the 
stationary  state,  as  we  now  show. 

With  unit  time  steps,  we  replace  dX/dt  by  AX  and 
dE/dt  by  A E.  To  find  the  change  in  energy  due  to  a 
state  change  of  neuron  i,  we  recall  the  Taylor  expan¬ 
sion  of  £(jX,  +  AX,|)  in  the  vicinity  of  £(|X,|): 


£(|X,  +  AX,))  -  £(|X,|)  +  £  |f-  AX, 


XX 


<?£■ 

ax, ax, 


ax, ax,  +  . . . 


(5) 
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One  can  therefore  calculate  the  change  in  neural  ener¬ 
gy  A E  due  to  the  state  changes  AX,  of  the  neurons  by 
A E  =  E(\X,  +  AX,j)  -  £(|X,|).  From  Eq.  (5),  keeping 
only  the  first  term, 

“■XU***--  161 

t 

Using  Eq.  (4)  in  Eq.  (6),  we  see  that,  if  dE/dXj  is 
negative,  we  set  the  state  X,  of  neuron  i  to  one  and  if 
dE/dXi  is  positive,  we  set  X,  to  zero.  Thus,  A E  is 
negative.  In  Eq.  (6)  the  higher-order  terms  were  omit¬ 
ted.  Thus,  the  prior  analysis  is  only  approximately 
correct,  and  the  energy  of  the  system  actually  can 
increase  on  some  iterations.  It  turns  out  that  this  is 
more  beneficial  than  harmful,  since  it  allows  the  sys¬ 
tem  to  escape  from  shallow  local  minima. 

Since  we  express  the  multitarget  tracking  problem 
as  a  constrained  optimization  problem,  it  is  also  possi¬ 
ble  to  use  conventional  (non-neural)  optimization 
techniques.  However,  such  techniques  are  not  suit¬ 
able  for  optical  implementation,  and  generally  require 
much  more  computation  than  the  Hopfield  net.  We 
therefore  restrict  our  attention  to  the  techniques  de¬ 
scribed  in  this  section. 

III.  Problem  Definition  and  Case  Study 

The  minimization  problem  we  consider  is  a  multitar¬ 
get  tracking  problem.  The  c'  r>  .-io  we  consider  as¬ 
sumes: 

(1)  Nt  targets  are  to  h  f  „ked  with  Nt  known  and 
fixed  (being  determined  by  the  track  initiator). 

(2)  There  are  Nm  measurements  each  time  step  or 
frame  of  data.  Nm  is  fixed  and  is  the  maximum  num¬ 
ber  of  measurements  we  will  accept  in  any  time  frame. 
This  is  achieved  as  explained  below.  If  the  number  of 
peaks  (measurements)  is  less  than  Nm,  we  lower  the 
detection  threshold  or  insert  artificial  measurements 
to  insure  that  at  each  time  we  have  Nm  ^  Nt  measure¬ 
ments. 

(3)  The  targets  do  not  accelerate  appreciably  during 
the  time  steps  under  investigation  and  thus  their  tra¬ 
jectories  are  approximately  straight  lines. 

(4)  Each  target  corresponds  to  no  more  than  one 
measurement  at  each  time. 

(5)  Each  measurement  is  due  to  no  more  than  one 
target  at  each  time.  (That  is,  we  ignore  crossing  tar¬ 
gets  for  now.) 

(6)  At  each  time  step,  each  target  must  be  assigned 
to  one  measurement.  Our  selection  of  Nm  in  item  (2) 
insures  that  Nm  ^  Nt  so  that  this  rule  can  be  satisfied. 

The  optimization  problem  is  to  assign  one  track  to 
each  target,  i.e.,  for  each  time  step  one  set  of  detected 
objective  is  to  find  the  Nt  best  straight  lines  in  the 
given  data. 

IV.  Problem  Formulation 

We  first  present  our  notation,  introduce  the  distance 
measures  we  wish  to  minimize,  and  then  develop  a 
neuron  energy  description.  We  denote  the  measured 
position  vectors  at  time  steps  o,  a  +  1 ,  and  a  +  2  by  r". 


rjj+1,  and  r“+2,  respectively.  The  subscripts  a,  d,  and  y 
are  used  to  refer  to  a  particular  one  of  the  Nm  different 
measurements  at  a  given  time  step,  the  time  steps 
being  indexed  by  the  superscript.  We  denote  the  vec¬ 
tor  difference  between  a  specific  measurement  (one  of 
the  a)  at  time  step  a  and  one  of  the  (i  measurements  at 
time  step  a  +  I  by  d®d  =  r“  -  r^1.  Similarly,  d®*1 
denotes  the  vector  distance  between  a  measurement  d 
at  time  step  a  +  1  and  a  measurement  y  at  a  +  2.  In 
terms  of  these  vector  distances,  the  vector  distance 
measure  we  wish  to  minimize  for  a  sequence  of  three 
time  steps  for  all  measurements  is 

=  IlC, (') 

The  minimum  of  Eq.  (7)  assigns  one  measurement  in 
each  of  time  steps  a,  a  +  1 ,  and  q  +  2  to  the  same  target. 
Note  that  D  in  Eq.  (7)  is  the  norm  of  a  vector  differ¬ 
ence.  This  ensures  that  two  successive  distance  vec¬ 
tors  (for  time  steps  a  and  a  +  1 )  should  be  collinear  to 
minimize  D;  that  is,  D  is  minimized  for  straight  line 
tracks.  With  equal  time  step  increments  and  a 
straight  line  trajectory  with  no  acceleration,  the  two 
distance  vectors  will  be  equal  (for  true  target  measure¬ 
ments).  Thus  D  will  be  0  for  the  case  of  three  collinear 
and  evenly  spaced  measurements  in  three  successive 
frames. 

The  measure  D  will  now  be  used  as  a  basis  for  the 
description  of  an  energy  function  E  which,  when  mini¬ 
mized,  solves  our  problem.  We  label  each  binary  neu¬ 
ron  with  three  indices,  such  as  X,aa,  where  i  is  the 
target  index,  a  is  the  measurement  index,  and  a  is  the 
time  step  index.  This  neuron  is  active  (i.e.,  Xu,a  =  1)  if 
the  ith  target  is  associated  with  a  specific  position 
vector  r“  (one  of  the  measurements  a)  at  time  step  o, 
and  otherwise  X,na  =  0.  The  energy  function  to  be 
minimized  for  the  optimization  problem  in  Sec.  Ill  can 
be  written  as 

ft  a  t  )*i 

+AjXXSSx,‘>‘,x“,“ 

I  a  a  6  w*  a 

+^x(isx'"°-^y 

^  ^  ^  D^j),^i«,^,dla+||AO(0+2l  •  181 

a  t  a  8  y 

where  A\-A*  are  positive  constants.  Their  choice  is 
discussed  in  Sec.  VI. 

We  now  discuss  the  terms  in  this  energy  function  to 
provide  an  understanding  of  it.  We  first  note  that  all 
terms  are  positive  semidefinite.  Consider  the  first 
term:  each  term  in  this  sum  is  either  0  or  1  (since 
binary  neurons  are  employed).  Note  that  X„,0  and 
X jaa  denote  neuron  states  associated  with  targets  i  and 
j  (any  of  the  N  r  targets)  and  some  measurement  o  (of 
the  Nm)  at  time  step  a.  The  first  term  contains  the 
sum  over  the  measurements  and  time  steps  of  products 
of  these  neurons.  Since  only  the  target  index  (/  or  j) 
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differs  in  the  X,aaXJaa  product,  a  given  term  in  term  1 
can  be  one  only  if  targets  i  and  j  are  assigned  to  the 
same  measurement  a  at  the  same  time  step  a.  Since 
we  sum  over  «,  a,  i,  and  j,  term  1  is  zero  if  and  only  if  at 
each  time  step  no  measurement  is  assigned  to  more 
than  one  target.  Thus,  minimization  of  this  first  term 
occurs  when  each  measurement  is  associated  with  no 
more  than  one  target.  It  therefore  enforces  condition 
(5)  in  Sec.  III. 

The  second  term  in  Eq.  (8)  consists  of  a  sum  of 
products  with  different  measurement  indices  («  and  0) 
on  the  neurons.  It  is  therefore  minimized  when  one 
target  is  associated  with  no  more  than  one  measure¬ 
ment  («  or  0)  at  the  same  time  step  a.  Thus  term  2  is 
included  so  that  the  system  satisfies  condition  (4)  in 
Sec.  III. 

W e  next  consider  term  3.  For  a  fixed  time  a ,  the  set 
of  neurons  X,aa  for  the  various  target  indices  i  and 
measurement  indices  a  can  be  arranged  in  a  matrix 
with  horizontal  index  a  and  vertical  index  i.  When 
one  measurement  has  been  assigned  to  each  target,  this 
matrix  has  a  single  one  in  each  row  i,  indicating  which 
measurement  is  assigned  to  this  target.  Thus,  £< 
Xiao  for  a  fixed  time  a  is  the  number  of  nonzero  entries 
of  the  matrix  above.  This  sum  equals  Nt  when  condi¬ 
tion  (5)  of  Sec.  Ill  is  satisfied.  Thus,  term  3  is  mini¬ 
mized  when  all  N-p  targets  are  each  associated  with  one 
measurement  at  each  time  step;  it  is  included  so  that 
condition  (6)  in  Sec.  Ill  is  satisfied.  Hence,  the  first 
three  terms  in  Eq.  (8)  ensure  that  the  measurement- 
target  matching  is  admissible. 

In  term  4,  both  the  time  step  and  measurement 
indices  on  the  three  neurons  in  the  product  differ. 
This  term  selects  a  measurement  (from  each  of  the  sets 
labeled  by  a,  0,  and  7)  in  each  of  three  successive  time 
frames  for  each  target.  The  measurement-target 
pairs  are  selected  such  that  these  three  measurements 
lie  closest  to  a  straight  line,  with  the  search  done  for 
each  target  and  for  each  measurement  a  in  each  frame. 
To  see  how  this  is  accomplished,  recall  that  Daa in  Eq. 
(7)  is  calculated  for  three  successive  time  steps.  The 
three  neurons  in  term  4  in  Eq.  (8)  have  their  time 
indices  appropriately  stepped.  For  a  fixed  time  frame, 
the  three  X  terms  can  each  be  represented  by  a  matrix 
with  horizontal  index  a  and  vertical  index  i,  as  before. 
Fora  fixed  target  t,  the  neuron  choices  to  be  considered 
occur  in  the  same  row  (row  i )  in  each  of  these  matrices. 
Each  row  of  each  matrix  should  have  only  a  single  one, 
because  of  condition  (5)  in  Sec.  III.  The  best  choice 
for  the  position  of  these  ones  is  determined  as  follows. 

Consider  that  there  are  ten  measurements  in  each 
frame.  We  select  a  measurement  a  in  frame  a.  For 
the  single  measurement  or  chosen,  there  are  ten  possi¬ 
ble  choices  for  the  measurement  (indexed  by  0)  in  the 
second  frame  and  for  each  of  these  there  are  ten  possi¬ 
bilities  for  the  third  measurement  indexed  by  7.  For 
each  of  these  100  combinations,  the  three  X  factors  in 
term  4  could  all  be  1,  but  for  only  one  set  of  these  will  D 
be  small.  Minimization  of  E  for  this  term  ensures  that 
the  set  of  three  successive  measurements  chosen  (for 
each  measurement  in  the  first  frame)  will  be  the  set 
closest  to  a  straight  line.  The  summation  over  a  im- 


Pig.  1.  Block  diagram  of  the  multitarget  tracking  neural  processor. 


plies  that  this  minimization  is  repeated  for  each  time 
step  (using  the  prior  two  frames  of  data). 

A  possible  variation  to  the  energy  function  in  Eq.  (8) 
is  the  omission  of  the  first  term.  Then,  one  would  not 
be  enforcing  the  assignment  of  only  one  target  to  each 
input  measurement  (this  case  arises  if  two  paths  of 
different  tracks  cross).  In  our  simulations,  this  term 
was  retained.  We  intend  to  do  more  work  on  the 
target-crossing  problem  in  the  future. 

Note  that  the  energy  function  in  Eq.  (8)  enables  us  to 
tolerate  both  spurious  measurements  and  the  absence 
of  measurements  for  some  targets  at  some  time  steps. 
To  achieve  this,  let  Nm  be  the  largest  number  of  mea¬ 
surements  at  any  of  the  Np  time  steps.  If  a  given 
frame  has  M  <  Nm  target  measurements,  we  set  Nm  ~ 
M  measurements  equal  to  the  zero  vector.  (In  our 
reference  frame  the  zero  vector  lies  in  the  center.  This 
choice  minimizes  the  effect  of  missing  measurements 
on  D,  for  the  case  of  a  uniform  spatial  distribution  of 
targets).  Spurious  measurements  (if  they  have  ran¬ 
dom  position  vectors,  as  they  should)  will  not  be  as¬ 
signed  to  true  target  tracks  because  of  the  energy  mini¬ 
mization  step.  If  both  spurious  and  missing 
measurements  are  present,  we  have  found  that  the 
spurious  measurements  are  assigned  to  the  same 
tracks  as  the  missing  measurements  (zero  vectors). 

The  time  evolution  of  the  neurons  is  required  to 
result  in  a  neural  system  that  minimizes  E  in  Eq.  (8). 
This  is  given  by  the  derivative  of  Eq.  (8),  i.e., 

JL.-2Ai'£Xjat  +  2Ai'£xi» 

***  />!  0+0 

+2A>(ZXx'*-*r) 

0  1 

+  D><n  1  ♦  I )  +  (®) 

Figure  1  shows  the  block  diagram  of  the  neural  mul¬ 
titarget  tracker  described  by  Eqs.  (4),  (8),  and  (9).  We 
produce  dE/dX  in  Eq.  (9)  from  X  and  threshold  dE/dX 
as  defined  in  Eq.  (4)  to  produce  the  new  X  with  E  given 
by  Eq.  (8)  from  which  dE/dX  is  obtained  in  a  closed 
loop.  In  this  design,  the  activities  of  the  neurons 
evolve  according  to  Eqs.  (9)  and  (4),  and  thus  their 
states  will  evolve  to  a  steady-state  energy  minimum. 
This  minimum  indicates  which  target  should  be  associ¬ 
ated  with  which  measurement  at  each  time  step. 
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Fig.  2.  Schematic  of  an  optical  neural  processor  with  (a)  quadratic 
energy  terms  and  (b)  cubic  energy  terms. 


We  now  discuss  how  to  realize  the  terms  in  Eq.  (9)  as 
linear  algebra  functions.  We  consider  the  case  of  Nx  = 
10  targets,  Nm  =  10  measurements,  and  Np  =  3  time 
steps.  There  are  Nx  X  Nm  X  Np  =  300  neurons  that 
represent  the  different  Xlaa.  We  represent  these  neu¬ 
rons  as  a  vector  x  with  elements  x*,  where  each  value  of 
the  index  k  denotes  a  different  (i,a,a)  combination.  In 
steady  state,  x  will  have  thirty  “one”  entries  (ten  mea¬ 
surement-target  pairs  for  each  of  three  time  step  val¬ 
ues  a).  Term  1  in  Eq.  (9)  is  the  sum  of  a  number  of 
such  vectors  and  can  thus  be  described  as  a  matrix- 
vector  product  2i4iTix  =  y|.  The  elements  T*f  of  the 
binary  connection  matrix  T|  are  described  by 

7*1  "  T’kmjMp  “  -  U0) 

where  the  indices  k  and  l  denote  the  different  sets  of 
target/measurement/time  parameters  (iaa)  and  0/36), 
respectively.  Since  both  k  and  l  range  over  300  values, 
Tj  is  a  300  X  300  matrix.  This  yj  term  has  nonzero 
contributions  to  the  output  only  for  indices  corre¬ 
sponding  to  different  targets  (i  ^  j)  but  the  same 
measurement  (a  =  0)  and  time  step  ( a  =  6). 

Terms  2  and  3  in  Eq.  (9)  are  other  sums  of  vectors  x 
and  can  likewise  be  written  as  matrix-vector  products 
2A{Tix  =  y2  and  2A3T3X  =  yg.  For  these  first  three 
terms,  the  connection  matrices  are  fixed  and  thus  we 
can  form  Tx  =  (2A)T|  +  2A2T2  +  2A,3Tj)x  in  a  single 


step  with  T  and  the  constants  A  1-A4  fixed  for  all  prob¬ 
lems. 

Term  4  in  Eq.  (9)  is  more  complex.  Each  of  the 
three  parts  of  this  term  is  similar  and  contains  the 
product  of  two  different  neuron  states  which  we  can 
relabel  as  X*  and  X;  times  a  tensor  D  of  rank  three.  If 
we  think  of  X*  and  X<  as  components  of  a  vector  x,  the 
product  of  two  different  neuron  states  X*X/  for  all  k 
and  l  is  a  matrix  with  components  X*X;  (where  k  and  / 
are  the  row  and  column  indices,  respectively).  This  is 
the  vector  outer  product  (VOP)  matrix  xx',  where  the 
superscript  t  denotes  the  transpose.  In  terms  of  this 
new  vector  labeling  scheme,  term  4  can  be  written  as 

y<, =  D/kiXkXf,  01) 

k,i 

where  denotes  the  ;th  component  of  term  4  in  Eq. 
(9).  We  can  view  Das  a  number  of  matrices.  The  sum 
of  products  in  Eq.  (II)  for  a  given  j  is  the  sum  of  the 
point-by-point  products  of  the  elements  of  the  VOP 
matrix  and  one  of  the  matrices  in  D.  The  result  for  all 
j  is  a  vector  y*,  which  is  called9  the  tensor-matrix  inner 
product,  i.e., 

y4  =  D  •  xx'.  (12) 

Since  this  is  not  a  simple  matrix-vector  product,  it 
cannot  be  calculated  in  the  same  way  as  the  other  three 
terms  in  Eq.  (9).  In  addition,  D  changes  with  the 
input  data  whereas  the  matrix  T  in  the  other  terms  is 
fixed.  We  now  investigate  how  the  linear  algebra  op¬ 
erations  and  thresholding  described  in  this  section  can 
be  done  optically. 

V.  Optical  Architecture 

As  has  often  been  noted,  connectionist  architectures 
are  well  suited  for  optical  implementation  since  optical 
systems  easily  achieve  large  numbers  of  intercon¬ 
nects.10  In  the  optical  design  of  the  Hopfield  net  by 
Psaltis  and  Farhat,u>12  a  matrix-vector  multiplier  was 
sufficient.  This  optical  realization  is  suitable  for  im¬ 
plementing  time  evolution  equations  that  are  linear  in 
the  neuron  activities  X  (or  equivalently,  neural  sys¬ 
tems  and  applications  in  which  the  energy  function  is 
quadratic  in  X).  Such  an  optical  architecture  is  most 
efficient  if  only  non-negative  connection  matrices  are 
involved.  Thus,  the  first  two  terms  in  Eq.  (9)  and  the 
positive  definite  part  (2A3  JT,  Y.B  Xjfa)  of  the  third 
term  can  easily  be  calculated  by  unipolar  optical  ma¬ 
trix-vector  multiplications.  Our  optimization  prob¬ 
lem  involves  one  energy  term  which  is  cubic  (term  4) 
and  a  negative  term  (part  of  term  3).  The  optical 
realization  of  these  terms  is  now  discussed. 

Equation  (9)  can  be  realized  on  the  optical  system  of 
Fig.  2  as  we  now  detail.  This  optical  system  is  best 
drawn  in  two  parts:  Fig.  2(a)  (which  performs  a  ma¬ 
trix-vector  multiplication  and  implements  terms  1-3) 
and  Fig.  2(b)  (which  implements  term  4).  The  data 
plane  B 1  is  common  to  both  parts  of  the  optical  system. 
For  simplicity,  only  the  essential  lenses  are  shown.  In 
Fig.  2(a),  the  vector  data  on  a  1  -D  bistable  device13  B\ 
is  the  current  neuron  state  x.  It  is  imaged  vertically 
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and  expanded  horizontally  by  Ll  onto  a  2-D  spatial 
light  modulator  (SLMl)  which  contains  the  matrix 
interconnection  data  T  in  Eq.  (10).  This  is  a  fixed 
interconnection  pattern  that  is  independent  of  the 
data.  Thus,  it  can  be  recorded  on  film  and  need  not  be 
altered.  The  light  leaving  SLM 1  represents  the  point  - 
by  point  products  TtJXj.  This  light  is  integrated  verti¬ 
cally  and  input  back  to  Bl  (by  four  mirrors  M  in  the 
version  shown).  This  forms  the  matrix-vector  prod¬ 
uct  Tx  =  (£>  TijXj,Y.  T2jx,,.  .  .,£>  TnjXj). 

The  same  B\  is  also  present  in  the  system  of  Fig. 
2(b).  In  Fig.  2(b),  its  output  is  expanded  horizontally 
by  LI  and  rotated  by  90°  and  expanded  vertically  (by 
the  beam  splitter  ( BS ),  mirror  (M),  and  lens  L2], 
These  two  expanded  patterns  are  superimposed  on  a 
second  bistable  device  B2  with  horizontal  and  vertical 
indices  k  and  l.  The  light  intensity  incident  on  ele¬ 
ment  (k,l)  of  B2  is  Xk  +  X/.  When  the  threshold  for  B2 
is  set  to  trigger  only  if  both  inputs  are  active,  the  B2 
output  is  the  binary  VOP  of  the  Bl  data.  This  VOP 
matrix  is  imaged  onto  SLM2  where  it  multiplies  sever¬ 
al  multiplexed  D  matrices  element  by  element.  The 
computer-generated  hologram  (CGH)  behind  SLM2 
directs  the  proper  element-by-element  products  to  dif¬ 
ferent  portions  of  Bl.  To  implement  term  4  on  this 
system,  we  form  x  on  Bl,  the  VOP  matrix  xx(  on  B2, 
and  the  tensor-matrix  inner  product  y4  =  D  •  xx'  on 
Bl  using  SLM2  and  the  CGH.  We  now  consider  the 
multiplexing  data  format  on  SLM2  and  the  CGH  used. 

In  our  simplified  index  notation  (Sec.  IV),  the  ele¬ 
ments  of  the  y4  vector  output  due  to  term  4  are  given  by 
Eq.  (11).  Consider  the  case  when  j,  k,  and  /  range  from 
1  to  3  in  Djki,  Xk,  and  X|.  The  neuron  vector  x  then  has 
three  elements  and  the  VOP  matrix  on  B2  is  3  X  3. 
Each  element  XkXi  must  multiply  the  three  different 
elements  Djki  corresponding  to  the  three  different  val¬ 
ues  that  j  can  take  given  the  indices  k  and  /.  One 
possible  multiplexed  arrangement  for  the  SLM2  data 
Djki  is  shown  in  Fig.  3,  with  the  VOP  elements  in  row  1 
of  B2  corresponding  to  ( k,l )  =  (1,1),  (1,2),  (1,3)  and  the 
elements  of  row  2  corresponding  to  ( k,l )  =  (2,1),  (2,2), 
(2,3),  etc.  The  spatial  size  of  the  elements  on  B2  and 
SLM2  and  the  imaging  optics  (not  shown)  from  B2  to 
SLM2  are  such  that  VOP  element  (1,1)  illuminates  the 
first  three  elements  in  column  1  of  Fig.  3  (i.e.,  Dm, 
Diu,  and  D311),  VOP  element  (1,2)  corresponding  to 
X1X2  illuminates  the  first  three  elements  of  column  2 
(D 112.  D212,  and  D3I2),  etc.  The  bold  lines  in  Fig.  3 
indicate  regions  of  SLM2  illuminated  by  one  element 
of  B2.  Since  D  is  a  tensor  of  rank  3,  it  is  not  possible  to 
assign  one  spatial  dimension  (horizontal  or  vertical)  to 
each  rank  (as  is  possible  with  a  tensor  of  rank  2,  i.e.,  a 
matrix).  This  arrangement  in  Fig.  3  multiplies  each 
VOP  element  X*X;  by  the  three  possible  j  values  in 
Djki  and  thus  forms  the  point-by-point  product  of  the 
VOP  matrix  and  the  different  D  matrices.  The  CGH 
behind  SLM2  focuses  all  products  with  the  same  j  onto 
the  same  region  of  Bl  (i.e.,  for  the  3X9  example  in  Fig. 
3,  it  sums  the  light  leaving  the  first,  fourth,  and  seventh 
rows,  the  light  leaving  the  second,  fifth,  and  eighth 
rows,  etc).  This  forms  the  sum  over  k  and  /  of  I),kiXi, Xi 


D 

D 

D 

111 

112 

113 

D 

D 

0 

211 

212 

213 

D 

D 

D 

311 

312 

313 

D 

D 

D 

121 

122 

123 

D 

D 

D 

221 

222 

223 

D 

D 

D 

321 

322 

323 

D 

D 

D 

131 

132 

133 

D 

D 

D 

231 

232 

233 

0 

D 

D  _ 

331 

332 

333 

Fig.  3.  Details  of  the  values  written  on  SLM2. 


for  each  j  in  a  different  region  of  Bl.  A  CGH  could  be 
placed  between  B2  and  SLM2  to  replicate  the  B2  data 
onto  the  proper  regions  of  SLM2  such  that  the  CGH 
behind  each  region  (3X3  region  for  the  example  in  Fig. 
3)  of  SLM2  could  be  a  simple  spherical  lens  plus  a 
grating  at  the  required  spatial  frequency  and  orienta¬ 
tion.  However,  since  the  CGH  is  fixed  and  indepen¬ 
dent  of  the  input  data,  it  appears  that  it  can  be  fabri¬ 
cated  on  film  with  sufficient  resolution  to  allow  one 
CGH  to  be  used  with  improved  light  budget  efficiency. 
We  detail  this  later. 

Next,  we  consider  how  the  negative  part  of  term  3  in 
Eq.  (9)  is  handled.  Recall  that  Bl  is  common  to  both 
parts  of  the  system  [Figs.  2(a)  and  (b)].  Thus,  the 
input  to  Bl  contains  the  sum  of  all  the  non-negative 
terms  in  dE/dX,„a  in  Eq.  (9),  i.e.,  the yth  element  on  the 
input  side  of  Bl  is  dE/dXj  +  2A3N7  (where  j  corre¬ 
sponds  to  (iota)].  Thus,  we  set  the  threshold  of  Bl  to 
be  2A^Nt  and  hence  achieve  the  subtraction  of  the 
positive  and  constant  2A^Nj  portions  of  term  3  by 
thresholding  without  the  need  to  compute  negative 
numbers  and  the  proper  neuron  vector  x  emerges  from 
Bl.  We  note  that  the  contents  of  SLM2  need  not 
change  between  iterations  (in  minimizing  the  energy 
E)  for  a  given  set  of  input  measurements.  Its  contents 
change  for  each  set  of  input  measurements,  but  such 
distance  calculations  are  needed  for  most  multitarget 
tracking  problems. 

We  next  consider  an  improved  version  of  the  system 
of  Fig.  2(b)  with  reduced  space-bandwidth  product  for 
B2  and  SLM2.  To  see  the  significance  of  this,  consider 
the  case  of  Nt  =  6,  Nm  =  7,  and  Np  =  5.  The  vector  x 
has  6  X  7X5  =  210  components,  the  VOP  has  210  X  210 
components,  and  SLM2  requires  210  X  210  X  210 
pixels.  Clearly,  for  large  values  of  Nt,  Nm,  and  Np, 
this  architecture  becomes  unrealistic.  Fortunately, 
not  all  the  terms  in  xx'  are  required  and  most  elements 
of  D  are  zero.  In  Eq.  (9)  we  see  that  only  those  values 
of  X,„aXjab  with  i  =  j  are  used.  To  lake  advantage  of 
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this,  we  divide  the  vector  x  into  Nt  smaller  vectors  z,  (t 
=  1,. .  .Nr),  one  for  each  target  j,  with  each  vector  of 
size  Nm  X  Np.  Thus,  the  full  vector  x  can  be  written  as 
x'  =  (Z|,  z'2, . . .,  z‘Nt).  This  allows  us  to  calculate  the 
VOP  z,z('  of  each  vector  separately,  multiply  each  by 
the  proper  elements  of  D  point  by  point,  and  sum  up 
the  products  as  indicated  in  Eq.  (9). 

We  now  discuss  how  to  efficiently  separate  the  ten¬ 
sor  D  into  several  smaller  matrices.  Recall  that  the 
full  tensor  D  is  described  by  NtNmNp  matrices,  each 
of  dimension  NtNmNp.  However,  the  only  nonzero 
elements  of  these  matrices  occur  for  i  =  j  and  do  not 
depend  on  the  value  of  i  or  j  (the  entries  of  D  do  not 
depend  on  the  target,  since  the  calculations  of  the 
elements  of  D  involve  only  distance  calculations  on  the 
measurements).  The  fact  that  only  three  adjacent 
time  steps  a  are  included  in  the  D  calculation  further 
reduces  the  number  of  nonzero  entries.  We  chose  to 
separate  the  D  tensor  into  Nm  matrices,  each  of  dimen¬ 
sion  NmNp.  To  see  how  this  is  possible,  recall  that 
each  distance  measure  is  associated  with  three  mea¬ 
surements  (a,  /S,  and  y)  at  three  different  sequential 
time  steps.  A  given  set  of  pairs  of  two  measurements 
at  two  different  (not  necessarily  successive)  time  steps 
is  described  by  a  matrix  of  dimension  NmNp.  There 
are  Nm  possibilities  for  the  third  measurement  in  the 
other  time  step  of  the  three  in  sequence.  Thus  there 
are  Nm  such  matrices,  each  of  dimension  NmNp,  that 
describe  the  tensor  data  D. 

This  division  is  attractive  since  each  of  the  Nt  =  6 
reduced  size  35  X  35  (when  NmNp  =  35)  VOP  matrices 
can  now  be  multiplied  by  each  of  the  NM  =  70  matri¬ 
ces.  After  summation  of  the  proper  point-by-point 
products,  the  output  is  Nt  =  6  vectors,  each  of  dimen¬ 
sion  NmNp  =  35,  i.e.,  the  6  X  35  =  210  element  neuron 
state  vector.  Thus,  this  new  arrangement  requires 
that  we  calculate  six  35  X  35  VOPs  (i.e.,  £2  requires 
only  6  X  35  X  35  =  7350  elements) ,  and  SLM2  is  only  of 
size  7  X  (35  X  35).  We  have  thus  reduced  the  space- 
bandwidth  product  of  SLM2  by  a  factor  of  over  1000. 
JBl  is  still  a  1-D  SLM  of  size  NtNmNp  =  210  elements 
(one  for  each  element  of  x).  This  size  for  x  is  still  much 
less  than  for  cases  when  one  assigns  one  neuron  for 
each  possible  single  target  state  (position  in  x  and  y) 
for  every  time  step  n  (i.e.,  xyn  neurons,  where  x  and  y 
are  the  number  of  pixels  in  the  ( x,y )  projections  of  the 
measurement  space,  respectively).  Our  assignment  of 
one  neuron  for  each  measurement  for  each  target  for 
each  time  results  in  a  much  smaller  number  (since  the 
number  of  measurement  points  per  frame  is  usually 
much  less  than  the  number  of  2-D  pixels  in  one  image 
frame). 

The  Nt  VOPs  of  the  partitioned  £1  data  can  be 
produced  on  B2,  by  replacing  Ll  in  Fig.  2  by  a  CGH 
that  is  a  set  of  Nt  cylindrical  lenses  with  gratings  at 
different  orientations  and  with  different  spatial  fre¬ 
quencies.  This  Ll  lenslet  CGH  array  focuses  the  Nt 
sections  of  x  on  the  vertical  fll  device  onto  Nt  differ¬ 
ent  horizontal  regions  of  B2  and  replicates  the  vector 
data  in  each  of  the  N  p  sections  horizontally  at  B2.  As 
before,  L2  expands  x  vertically  onto  B2.  After  thresh- 


Fig.  4.  Alternative  optical  architecture  to  implement  cubic  energy 
terms. 


olding,  the  result  on  B2  is  the  desired  Nt  VOPs  of  the 
NmNp  element  vector  in  each  of  the  Nt  regions  on  B 1. 
These  VOPs  thus  emerge  from  B2  as  Nt  matrices 
horizontally  separated,  with  each  matrix  of  dimension 
NwNp. 

The  architecture  in  Fig.  2  (even  with  the  reduced 
SLM2  and  B2  resolution  requirements)  requires  fast 
nonlinear  optical  devices  for  £1  and  £2  (since  they 
limit  the  speed  for  one  iteration  of  the  system).  Since 
such  devices  are  not  yet  readily  available,  we  present 
the  alternative  architecture  of  Fig.  4  that  does  not 
require  a  nonlinear  optical  device.  Rather,  it  com¬ 
putes  the  vector  outer  product  by  feeding  the  z  vectors 
to  the  rows  and  columns  of  an  electroded  SLM  such  as 
a  ferroelectric  liquid  crystal.14  Thus,  SLM3  in  Fig.  4 
can  be  substituted  for  the  2-D  bistable  device  £2  in  Fig. 
2(b).  The  outer  products  formed  by  SLM3  are  imaged 
onto  SLM2,  where  they  multiply  D,  and  the  CGH 
focuses  the  terms  belonging  to  the  same  sum  to  the 
same  point  on  the  detector  array  Dl  which  now  re¬ 
places  fll  in  Fig.  2.  The  thresholding  is  done  electron¬ 
ically  by  an  array  of  operational  amplifiers  fed  from 
the  detectors  D 1 .  These  detector  outputs  provide  the 
electronic  inputs  to  SLM3  in  Fig.  4. 

The  architectures  which  we  have  introduced  in  this 
section  are  quite  complicated.  We  do  not  find  this 
surprising:  it  is  well  known  that  the  multitarget  track¬ 
ing  problem  is  very  hard.  Therefore,  any  parallel  ar¬ 
chitecture  which  attempts  to  solve  this  problem  in  real 
time  will  probably  be  rather  sophisticated. 

VI.  Simulation  Results 

The  above  neural  net  and  algorithm  were  simulated 
for  the  case  of  Nt  =  Nm  -  4  and  Np  =  5.  A  256  X  256  X 
256  3-D  ( x,y,z )  space  was  used.  The  initial  four  mea¬ 
surement  positions  were  chosen  from  a  random  num¬ 
ber  generator.  The  velocities  and  directions  of  each 
target  were  similarly  chosen.  Np  =  5  equally  spaced 
time  steps  were  generated  for  each  target.  The  target 
directions  were  evenly  distributed  over  360°.  They 
generally  ranged  in  length  (in  a  2-D  projection)  by  a 
factor  of  5:1  with  the  longest  track  of  five  time  steps 
occupying  ~70%  of  the  field  of  view.  To  simulate 
imperfect  measurements,  each  position  was  perturbed 
by  n%  of  noise.  This  was  achieved  by  adding  a  uni¬ 
formly  distributed  random  number  to  the  measure¬ 
ment.  This  produced  a  random  variation  in  the  loca¬ 
tion  of  the  measurement  by  at  most  n%  of  its  distance 
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from  the  origin  (the  center  of  the  256  X  256  X  256 
space). 

For  each  run  (corresponding  to  a  given  value  n%  of 
noise),  ten  different  initial  data  conditions  (initial  tar¬ 
get  positions,  velocities,  and  directions)  were  used. 
Thus,  for  each  value  of  noise,  ten  sets  of  four  target 
tracks  were  processed.  The  initial  conditions  (the 
neuron  states  at  which  the  neurons  were  started  in  the 
processing)  were  chosen  by  randomly  perturbing  the 
all-equal  condition  (in  which  each  coordinate  is  as¬ 
signed  to  each  target  with  equal  probability).  This  is 
detailed  more  fully  in  Hopfield  and  Tank6  and  motiva¬ 
tion  for  randomizing  initial  conditions  is  also  provided 
there.  Ten  different  random  perturbations  were  used 
in  the  ten  simulations  in  each  run.  The  A1-A4  coeffi¬ 
cients  were  chosen  to  equally  weight  the  first  three 
terms  in  Eq.  (9),  i.e.,  A\  =  A%  -  A3  =  15  with  A4  chosen 
to  be  less  (A4  =  1.8).  Larger  At-A3  values  were  used  to 
give  more  weight  to  the  first  three  terms  in  Eq.  (9),  i.e., 
we  must  have  the  proper  form  for  the  matrix  Xiaa.  A\ 
and  A2  should  be  chosen  equal  (since  these  terms  cor¬ 
respond  to  enforcing  the  correct  row  and  column  struc¬ 
ture,  respectively,  and  are  thus  equivalent  by  symme¬ 
try).  A3  could  differ  from  these  terms  since  it 
multiplies  a  different  type  of  term;  on  the  other  hand, 
the  first  three  items  in  Eq.  (9)  have  similar  roles  and  it 
was  found  that  A3  =  Ai  =  A2  gave  good  results.  Term  4 
is  given  less  weight  (A4  =  1.8)  since  it  involves  the  sum 
of  more  products  than  do  the  other  terms  and  satisfy¬ 
ing  conditions  (1),(2),  (4),  (5),  and  (6)  in  Sec.  Ill  (terms 
1-3)  is  essential.  No  detailed  optimization  of  A1-A4 
was  attempted. 

The  coefficients  Da„  and  the  connection  matrix  (T) 
were  calculated.  The  threshold  2A3NT  in  term  3  in  Eq. 
(9)  was  slightly  increased  (from  2A3NT  to  2A3NT  + 
0.035).  We  found  this  to  be  helpful  when  noise  is 
present.  Such  an  increase  compensates  for  the  ne¬ 
glected  higher-order  terms  in  the  Taylor  expansion  in 
Eq.  (6).  In  the  simulation,  the  neural  activities  were 
updated  as  they  were  calculated,  i.e.,  the  state  of  the 
first  neuron  was  calculated  (using  the  most  recent 
values  of  all  other  neurons)  and  then  the  state  of  neu¬ 
ron  one  was  updated  before  calculating  the  state  of 
neuron  two,  etc.  This  better  models8  a  continuous 
time  rather  than  a  discrete  time  system.  After  one  set 
of  updates  of  the  neurons,  a  new  iteration  commenced 
until  the  neural  net  converged.  The  serial  mode  neu¬ 
ral  updating  results  in  faster  convergence  than  if  all 
new  neuron  states  are  calculated  in  parallel  and  simul¬ 
taneously  fed  back,  as  is  usually  done.7 


T*W*  I.  Simulation  Raautta  lor  Tracking  of  4  Target*  Through  S  Ttma 
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Fig.  5.  Typical  example  of  the  time  evolution  of  the  neural  states. 


Table  I  shows  the  results  obtained.  Column  1  lists 
the  percentage  noise  (positional  variation)  introduced 
into  the  measurements.  Column  2  gives  the  percent¬ 
age  of  runs  that  converged  in  less  than  50  iterations 
and  column  3  gives  the  average  number  of  iterations  to 
convergence  for  these  cases.  We  restricted  the  num¬ 
ber  of  iterations  to  50.  If  convergence  is  not  obtained 
after  50  iterations,  we  call  this  an  error.  If  the  system 
converged  to  the  wrong  set  of  measurement-target 
pair  assignments,  this  is  also  an  error.  As  seen,  the 
neural  net  converged  in  much  less  than  50  iterations  on 
the  average.  Also,  we  found  that  whenever  the  net 
converged,  it  converged  to  the  correct  target-measure¬ 
ment  matching.  A  proof  of  this  remains  to  be  derived 
(but  from  these  initial  tests,  one  should  be  able  to 
accept  results  that  converge  with  a  high  probability). 
At  low  noise  (n  =  0%  or  2.5%),  the  neural  net  converged 
to  the  correct  solution  for  all  40  target  tracks  (i.e., 
excellent  performance  was  obtained).  As  noise  in¬ 
creased,  correct  convergence  or  two  other  conditions 
occurred  (the  neural  net  wandered  with  no  apparent 
trend  to  convergence  or  it  oscillated  between  two 
states,  one  being  the  correct  solution  and  the  other 
differing  in  one  target-measurement  pairing). 

Figure  5  shows  a  representative  example  of  the  evo¬ 
lution  of  the  neurons  to  a  stable  state  for  one  set  of 
data.  In  Fig.  5,  each  rectangle  in  the  5X5  grid  shown 
represents  the  4X4  matrix  X,„a  for  i  =  1-4,  o  =  1-4, 
and  a  fixed  (the  assignments  at  one  time  step).  The 
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matrices  from  left  to  right  on  one  row  correspond  to 
different  time  steps  a  =  1-5  and  the  matrices  in  the 
different  rows  are  the  neuron  states  after  various  num¬ 
bers  of  iterations,  as  indicated.  The  horizontal  axis  of 
each  matrix  has  four  divisions,  corresponding  to  the 
four  different  targets  which  are  assigned  to  the  mea¬ 
surements.  The  four  different  measurements  in  a  giv¬ 
en  time  frame  occupy  the  various  rows  of  the  indicated 
matrix.  A  dark  spot  at  position  (/,<*)  indicates  that 
target  i  has  been  assigned  to  measurement  «.  The 
neural  net  has  converged  if  the  matrices  are  unchanged 
between  two  adjacent  iterations.  The  targets  are  cor¬ 
rectly  tracked  if  Xjaa  (for  a  fixed)  has  only  one  entry 
per  row  and  one  entry  per  column  and  row  and  column 
entries  are  the  same  in  all  time  frames.  This  is  because 
the  measurements  for  this  example  were  generated 
such  that  the  same  target  was  given  the  same  number 
in  all  time  frames. 

From  Fig.  5,  the  target-measurement  associations 
appear  random  after  the  first  iteration  (i.e.,  the  neural 
net  has  assigned  many  of  the  measurements  to  only 
one  target  and  vice  versa).  They  converge  and  eventu¬ 
ally  reach  a  correct  solution  after  15  iterations. 

VII.  Summary  and  Conclusion 

A  neural  net  energy  minimization  formulation  of 
multitarget  tracking  with  a  reduced  number  of  neu¬ 
rons  was  presented.  Cubic  energy  terms  were  present 
and  an  optical  architecture  to  implement  this  neural 
net  was  described.  Initial  simulations  indicate  that 
the  algorithm  has  desirable  performance,  even  in  the 
presence  of  random  noise.  We  expect  improved  per¬ 
formance  when  the  optimal  values  for  the  A  coeffi¬ 
cients  are  understood,  and  when  the  characteristics  of 
good  initial  neural  states  are  known. 

The  formulation  presented  can  be  extended  in  a 
variety  of  ways.  Other  sensors  than  the  simple  posi¬ 
tion  sensors  we  employed  can  be  introduced  and  non- 
straight  tracks  (i.e.,  accelerating  targets)  can  be  con¬ 
sidered.  These  extensions  will  not  complicate  the  en¬ 
ergy  formulation  or  optical  architecture  that  we  have 
presented,  since  they  require  only  that  the  D  tensor  be 
calculated  in  a  different  manner. 

The  main  reason  conventional  tracking  algorithms 
are  so  slow  is  because  they  enforce  a  serial  structure 
onto  a  process  that  is  essentially  parallel:  the  various 


targets  are  being  measured  simultaneously,  but  are 
processed  one  by  one.  The  parallelism  of  our  neural 
architecture  means  that  it  can  perform  at  a  much 
higher  rate.  We  therefore  believe  that  this  architec¬ 
ture  is  prototypical  of  the  types  of  system  that  will 
have  to  be  used  to  successfully  deal  with  a  complicated 
multitarget  scenario. 
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Abstract 

Multitarget  tracking  over  consecutive  pairs  of  time 
frames  is  accomplished  with  a  neural  net.  This  involves 
position  and  velocity  measurements  of  the  targets  and  a 
quadratic  neural  energy  function.  Simulation  data  are 
presented,  and  an  optical  implementation  is  discussed. 


Introduction 

Since  its  original  inception  the  Hopfield  neural  network 

[1]  has  been  used  to  solve  a  variety  of  problems,  including 
associative  memories  and  multitarget  tracking.  A  prior 
multitarget  tracking  neural  net  tracked  each  set  of  three 
consecutive  time  frames  using  a  cubic  energy  function 

[2] .  We  now  consider  a  similar  realization  utilizing  the 
Hopfield  net  and  a  quadratic  energy  function  that  leads 
to  a  simpler  optical  implementation.  These  solutions  are 
preferable  to  other  multitarget  tracking  neural  nets  that 
require  one  neuron  per  image  pixel  [3]. 


Neural  Energy  Function 

The  problem  is  to  track  multiple  targets  over  two  con¬ 
secutive  time  frames.  The  position  and  velocity  vectors 
of  each  target  are  assumed  known  via  real-time  measure¬ 
ments.  It  is  further  assumed  that  there  is  a  one-to-one 
correspondence  between  the  number  of  measurements 
and  the  number  of  targets  in  each  time  frame.  That 
is,  a  given  position  and  velocity  vector  pair  is  due  to  no 
more  than  one  target,  and  a  given  target  will  produce 


only  one  position  and  one  velocity  vector.  Unresolvable 
crossing  targets  and  spurious  measurements  can  still  be 
handled  if  these  assumptions  are  not  true,  as  our  initial 
results  indicate.  They  are  only  invoked  for  simplicity  in 
this  particular  investigation.  Finally,  it  is  assumed  that 
no  acceleration  occurs  between  the  two  time  frames  so 
that  velocity  is  constant  for  each  target  over  the  time 
span  between  samples. 

The  interconnection  pattern  linking  the  measurements 
of  one  frame  with  those  of  another  is  unique  due  to  the 
one-to-one  correspondence  condition.  The  problem  is 
solved  with  a  neural  net  by  assigning  each  neuron  Xij  to 
one  of  the  interconnections  between  measurements  t  and 
j  in  the  two  frames.  With  N  measurements  per  frame 
there  are  N 2  neurons  required.  The  relative  activity  of 
each  neuron  indicates  the  validity  of  each  connection  be¬ 
tween  measurements  in  the  two  frames.  Ideally,  only 
those  neurons  associated  with  valid  connections  will  be 
active,  while  all  others  are  driven  to  zero  by  minimiza¬ 
tion  of  an  appropriate  neural  energy  function.  One  such 
energy  function  is  given  by 

E(X)  =  Cr  £  £  *0  A,  +  -  1  )2 

*  j  •  j 

+<*£<£*« -1)’  o> 
>  * 

This  is  a  quadratic  neural  energy  equation  which,  when 
minimized,  will  determine  the  best  measurement  assign¬ 
ments.  The  coefficients  were  C\  =  Cj  =  C3  =  1  in  this 
case,  but  in  general  can  be  varied  to  weight  the  terms 
differently.  Each  term  will  now  be  explained. 

The  bias  term  Dij  is  derived  from  the  position  and  ve¬ 
locity  measurements  in  both  time  frames.  Let  Pji  be  the 
first  measured  position  vector  in  the  first  time  frame,  and 
A  2  be  the  first  measured  position  vector  in  the  second 


1 


time  frame.  Also  let  the  corresponding  measured  veloc¬ 
ity  vectors  be  represented  by  V\\  and  A  2  respectively. 
The  Du  term  is  then 

Du  =  AllPn  -  A 2||2  +  B\\Vn  -  V\2||2  (2) 

While  the  multiplicative  coefficients  A  and  B  may  in  gen¬ 
eral  be  different,  they  were  made  equal  to  give  the  posi¬ 
tion  and  velocity  measurements  equal  weight.  The  D,j 
term  is  a  measure  of  the  difference  in  the  position  and  ve¬ 
locity  of  the  *th  measurement  in  the  first  time  frame  and 
the  jth  measurement  in  the  second  time  frame.  Likewise 
the  Xa  neuron  corresponds  to  the  connection  between 
the  ith  and  jth  measurements.  The  first  term  in  (1) 
thus  serves  to  weaken  the  neurons  which  correspond  to 
the  largest  (magnitude)  changes  in  position  and  velocity. 
The  last  two  terms  in  (1)  are  equally  weighted,  therefore 
the  relative  weights  of  the  first  term  and  the  last  two  can 
be  adjusted  by  the  choice  of  A  and  B  in  (1). 

The  last  two  terms  in  (1)  enforce  the  condition  of  one- 
to-one  correspondence  between  measurements  in  the  two 
frames.  For  every  measurement  i  in  the  first  frame  there 
is  no  more  than  one  corresponding  measurement  in  the 
second  frame,  and  for  every  measurement  j  in  the  second 
frame  there  is  no  more  than  one  corresponding  measure¬ 
ment  in  the  first  frame.  This  still  allows  the  absence  of 
a  corresponding  measurement  or  more  measurements  in 
one  frame  than  the  other,  in  which  case  all  neurons  as¬ 
sociated  with  the  “extra”  measurement  will  be  driven  to 
zero.  This  accomodates  the  scenario  where  a  target  just 
enters  or  leaves  the  field  of  view  during  one  of  the  time 
frames.  The  final  X  solution  should  ideally  have  only 
N  ones  in  it,  where  N  is  the  minimum  of  the  number  of 
measurements  in  each  of  the  two  frames. 

In  the  Hopfield  neural  net  model,  the  time  evolution 
of  the  neurons  from  discrete  time  step  t  =  n  to  t  =  n  +  l 
can  be  given  by 

Xk,(n+l)  =  Xkl(n)-r,A  Xkl  (3) 

where  n  is  the  discrete  time  index  and  rj  is  the  step  size. 
From  the  derivative  of  (1),  then 

A  Xu  =  ^ffi=Dti  +  2(£xti-l) 

+2<2>«-l)  (4) 

I 

where  subscripts  k  and  /  have  been  introduced  for  clarity. 
Each  neuron  value  is  updated  by  iteratively  subtracting 
A Xki  from  each  Xki  until  the  net  converges  to  a  solution. 


Matrix-Vector  Formulation 

The  doubly-subscripted  neurons  Xij  can  be  thought  of 
as  elements  of  a  matrix,  with  i  as  the  row  index  and  j 
as  the  column  index.  Each  position  in  the  matrix  then 
represents  a  unique  connection  between  measurements  in 
the  two  time  frames.  The  bias  terms  D,j  can  be  thought 
of  in  a  similar  manner.  This  representation  is  conve¬ 
nient  for  verifying  convergence,  as  the  criteria  of  one- 
to-one  correspondence  is  equivalent  to  having  only  one 
maximum  neuron  value  per  row  and  per  column.  The 
indices  of  each  maximum  (*  and  j)  yield  the  desired  inter¬ 
connection  information  between  the  time  frames.  How¬ 
ever,  actual  implementation  of  the  network  is  far  easier 
if  the  X  and  D  values  are  described  as  elements  of  one¬ 
dimensional  matrices,  or  vectors,  with  elements  Xi  and 
D{.  Specifically,  the  first  N  elements  of  the  X- vector  are 
the  X\j  terms,  the  next  N  elements  are  the  Xij  terms, 
etc.  ,  assuming  there  are  N  measurements  in  each  time 
frame;  a  similar  column  vector  is  used  for  the  Dij  terms. 

Using  the  vector  format,  we  implement  the  neural  net¬ 
work  in  (4)  by  a  matrix-vector  multiplication  and  a  vec¬ 
tor  addition, 

AX<  =  r i(E,MijXi  +  Di).  (5) 

i 

The  weight  matrix  M  in  (5)  combines  the  two  summa¬ 
tions  in  (4),  as  we  now  detail  by  considering  the  Xki 
terms  in  (4).  For  a  given  k,  the  first  summation  in  (4) 
is  the  sum  of  the  elements  contained  in  row  k  of  the  X 
matrix.  Likewise,  for  a  given  /,  the  second  summation  is 
the  sum  of  all  elements  in  column  /  of  the  matrix.  Now 
consider  the  matrix-vector  product  of  M  and  X.  Both 
summations  are  satisfied  for  the  case  of  9  neurons  by 
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where  the  factor  of  2  in  each  summation  in  (4)  has  been 
included  in  M.  In  general  this  same  block  structure  can 
be  extended  to  accomodate  more  targets,  where  there 
are  N  =  3  targets  and  N 2  neurons  in  the  above  example. 
For  the  case  of  an  unequal  number  of  measurements  (N\ 
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Figure  1:  Optical  implementation. 


measurements  in  the  first  frame  and  N?  measurements 
in  the  second),  the  same  block  matrix  structure  of  M 
is  retained,  with  each  block  matrix  being  Ni  x  and 
with  N\  block  matrices  in  each  row  and  column  of  M . 

The  —2  terms  in  each  summation  are  combined  with 
Di  to  give 

Di  =  Di  -  4  (7) 

as  in  (5).  Thus,  (5)  can  be  implemented  as  a  matrix- 
vector  multiplier  with  an  added  bias  vector  D,.  This 
lends  itself  to  an  optical  implementation  as  shown  in 
Fig.  1.  The  neurons  X  are  represented  by  a  linear  array 
of  laser  diodes  or  a  linear  spatial  light  modulator  in  plane 
PI.  The  weight  matrix  M  is  the  two-dimensional  matrix 
in  P2,  which  is  fixed  and  can  be  stored  on  film.  The  bias 
vector  D  is  produced  by  another  array  of  laser  diodes  or 
1-D  spatial  light  modulator  in  P3.  The  summed  result 
is  detected  at  P4  and  fed  back  electronically  to  the  PI 
array  for  the  next  iteration.  A  nonlinear  function  (of¬ 
ten  referred  to  as  the  neuron  function)  is  applied  to  the 
P4  output  in  Fig.  1  to  keep  the  X,  values  between  zero 
and  one.  The  nonlinear  function  we  used  is  the  sigmoid 
function 

Xi  =  0.5(1  +  tanh  ^-),  (8) 

uo 

where  the  u<  are  the  detected  values  at  P4  and  uo  is  a 
parameter  which  determines  the  slope  of  the  function,  or 
the  degree  of  “binarization”,  of  the  neurons. 


Simulation  Results 

One  of  the  more  difficult  scenarios  for  multitarget  track¬ 
ing  involves  multiple  crossing  targets.  A  set  of  six  cross¬ 
ing  targets  (denoted  by  A  to  F)  with  3-D  position  and 
velocity  vectors  was  created  for  use  in  simulation  of 
the  network.  A  total  of  five  time  frames  (four  pairs  of 
measurments)  were  processed  in  the  initial  simulations. 
Fig.  2  shows  the  target  positions  in  time  frames  3  and  4. 
The  target  position  vectors  are  shown  projected  onto  the 
three  (x-y,  x-z,  y-z)  major  planes.  Each  of  the  six  tar¬ 
gets  are  represented  by  an  arrow  with  the  tail  denoting 
the  position  in  frame  3  and  the  head  denoting  the  po¬ 
sition  in  frame  4.  The  direction  of  the  arrows  indicates 
the  target  velocity  vector  direction,  and  the  length  of 
the  arrow  indicates  the  velocity  vector  magnitude.  Tar¬ 
get  A  is  moving  in  the  x-y  plane  along  the  x-axis,  and 
thus  its  projection  in  the  y-z  plane  is  a  point  at  the  ori¬ 
gin.  Similarly,  targets  B  and  C  are  traveling  along  the 
y  and  z  axes  respectively  (and  thus  appear  as  points  at 
the  origin  in  the  x-z  and  x-y  planes  respectively).  All 
six  targets  cross  at  the  origin  at  different  times  between 
time  frames  3  and  4. 

Fig.  3  shows  the  x-y  projections  of  the  targets  at  time 
frames  3,  4  and  5.  Each  target  is  denoted  by  a  separate 
symbol  with  its  position  in  time  frame  3  denoted  by  the 
symbol  labeled  with  a  letter.  For  example,  target  A 
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Figure  3:  Time  sequence  of  target  positions  (projected 
in  the  x-y  plane)  for  three  successive  time  frames. 
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moves  right  to  left  along  the  x-axis  as  shown.  Target  C 
travels  in  the  negative  z  direction  (into  the  page)  and 
thus  appears  as  a  point  at  the  origin  in  all  three  time 
frames.  The  arrows  indicate  velocity  vector  direction. 
The  scale  of  the  axes  in  Fig.  3  differs  from  Fig.  2,  in 
order  to  allow  inclusion  of  all  targets  over  the  three  time 
frames. 

The  simulation  of  our  algorithm  was  performed  on  a 
Hecht-Nielsen  neurocomputer  system  using  Anza  Plus 
neurosoftware.  This  system  allows  direct  implementa¬ 
tion  of  the  Hopfield  neural  network  and  other  major 
neural  net  models.  The  bias  vector  D  and  the  weight 
matrix  M  were  computed  offline  and  loaded  into  the 
neurocomputer.  A  new  vector  D  is  used  for  each  pair 
of  frames,  while  the  weight  matrix  M  remains  the  same 
for  all  pairs  of  frames.  The  network  iterated  until  the 
neuron  values  converged  (equal  to  within  .01  for  at  least 
two  consecutive  iterations).  The  initial  neuron  states 
were  randomized  using  a  uniform  zero-mean  distribution 
with  a  maximum  value  of  ±0.0003.  Prior  to  randomiza¬ 
tion  the  initial  neuron  states  were  all  set  equal  to  1/36  so 
that  the  sum  of  the  Af2  =  36  neurons  was  equal  to  one. 
Ten  different  initial  neuron  vectors  were  used  for  each 
of  the  four  pairs  of  time  frames.  (The  iteration  data 
were  averaged  over  the  ten  runs,  and  negligible  differ¬ 
ences  were  obtained).  A  graphical  representation  of  the 
neuron  states  Xu  in  the  two-dimensional  matrix  format 
is  shown  at  four  iteration  steps  in  Fig.  4  for  time  frames 
3  and  4.  The  indices  in  Fig.  4  are  i  =  frame  3  (vertical) 


Figure  4:  Convergence  of  neuron  values. 

and  j  =  frame  4  (horizontal)  with  the  six  measurements 
per  frame  assigned  randomly.  The  size  of  the  darkened 
squares  indicates  the  neuron  value,  with  a  clear  square 
denoting  a  value  of  zero,  and  a  fully  darkened  square 
denoting  a  value  of  one.  The  initial  (iteration  0)  neu¬ 
ron  values  are  too  small  to  be  resolved  in  the  figure, 
since  the  maximum  random  variance  is  small.  Succes¬ 
sive  iterations  lead  to  the  final  stable  pattern  shown  at 
iteration  70  for  a  single  pair  of  time  frames.  This  final 
Xa  pattern  illustrates  the  interconnection  assignments 
of  corresponding  measurements  in  the  two  frames.  (The 
indices  associated  with  targets  A  to  F  in  the  two  frames 
are  shown  in  this  final  matrix).  A  single  one  is  present 
in  each  row  and  column,  and  they  correctly  associate  the 
targets  in  the  two  frames.  The  smallest  “true”  neuron 
activity  level  was  0.924  and  the  largest  “false”  level  was 
0.01  (thus  allowing  easy  thresholding). 

Simulations  were  run  for  two  different  sequences  of  six 
targets.  Case  2  was  the  scenario  illustrated  in  Figs.  2- 
3.  Case  1  is  a  less  complicated  crossing  scenario  (with 
only  two  or  three  targets  crossing  at  any  one  time).  To 
simulate  realistic  cases,  random  noise  was  added  to  the 
measured  position  and  velocity  vectors  in  (2)  for  each 
target  in  frames  2  to  5.  This  measurement  noise  was 
uniformly  distributed  with  zero  mean  and  a  maximum 
percent  error  in  any  measured  position  or  velocity  com¬ 
ponent  of  0%,  5%  and  10%.  The  average  number  of  it- 
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Figure  5:  Iterations  vs.  measurement  error. 
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Figure  6:  Neuron  function  for  uq  =  0.2. 


erations  required  (over  the  four  pairs  of  time  frames  and 
ten  initial  neuron  vectors)  for  the  different  amounts  of 
measurement  noise  in  the  two  cases  are  shown  in  Fig.  5. 
As  expected,  the  simpler  case  1  scenario  requires  fewer 
iterations.  The  number  of  iterations  required  for  any 
pair  of  frames  ranged  from  15  to  40  for  case  1  and  from 
20  to  70  for  case  2  (with  the  largest  number  of  iterations, 
70  as  in  Fig.  4.,  corresponding  to  the  time  frame  when 
all  six  targets  crossed  with  the  maximum  10%  measure¬ 
ment  error  present).  The  results  show  that  even  in  the 
presence  of  measurement  errors,  the  neural  net  was  very 
robust  and  did  not  require  a  significant  increase  in  the 
average  number  of  iterations  (i.  e.  26  to  30  iterations  for 
case  1  and  37  to  40  iterations  for  case  2).  In  all  cases, 
the  network  converged  to  the  correct  solution  for  all  six 
targets  in  each  of  the  time  frames. 

The  network  parameters  used  were  t?  =  0.3  and 
uo  =  0.2  in  all  runs.  A  graph  of  the  neuron  function 
for  u o  =  0.2  is  shown  in  Fig.  6.  No  intensive  effort  was 
made  to  optimize  these  parameters.  This  merits  future 
work.  For  example,  the  neuron  increment  in  (4)  is  lin¬ 
ear  in  Xij,  and  for  this  case  techniques  for  choosing  17 
exist  [4].  Extending  these  to  piecewise-linear  cases  may 
be  possible.  Smaller  uq  values  increase  the  “binariza- 
tion”  of  the  neuron  values  and  increase  the  probability 
of  convergence  to  relatively  shallow  local  minima  rather 
than  the  correct  global  minimum.  Larger  uo  values  make 
thresholding  of  the  outputs  more  difficult.  The  uo  =  0.2 
value  used  is  a  compromise  between  these  effects. 

The  A  and  B  parameters  in  (2)  used  to  determine  the 
bias  vector  were  equal  and  constant  for  each  case.  We 
used  A  =  B  =  0.01  for  all  time  frames  in  case  1,  and  A  = 
B  —  0.025  for  case  2.  Larger  values  for  these  parameters 
weight  the  bias  term  (due  to  the  position  and  velocity 
vectors)  more  than  the  one-to-one  correspondence  terms 
in  the  energy  function.  For  case  2,  the  larger  A  and 
B  values  were  used  to  improve  the  rate  at  which  the 
network  converged.  (Larger  values  also  increased  the 
probability  of  an  incorrect  solution).  The  values  used 
are  not  optimized  in  these  initial  tests. 


Conclusion 


A  Hopfieid  neural  net  using  a  quadratic  cost  function  was 
successfully  used  to  solve  the  multitarget  tracking  prob¬ 
lem  over  pairs  of  consecutive  time  frames  using  3-D  tar¬ 
get  trajectories.  This  network  can  be  implemented  with 
a  simple  optical  architecture  to  achieve  real-time  process¬ 
ing  for  large  numbers  of  targets,  with  convergence  in  rel- 
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ativelyfew  iterations  (15-70).  With  optimization,  faster 
convergence  times  are  expected.  The  network  proved 
to  be  robust,  converging  successfully  with  measurement 
errors  of  up  to  10  percent. 
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ABSTRACT 

A  symbolic  neural  net  is  described.  It  uses  a  multichannel  symbolic  correlator  to  produce  input 
neuron  data  to  an  optical  neural  net  production  system.  It  has  use  in  obstacle  avoidance,  navigation, 
and  scene  analysis  applications.  The  shift-invariance  and  ability  to  handle  multiple  objects  are  novel 
aspects  of  this  symbolic  neural  net.  Initial  simulated  data  are  provided  and  symbolic  optical  filter 
banks  are  discussed.  The  neural  net  production  system  is  described.  A  parallel  and  iterative  set  of 
rules  and  results  for  our  case  study  are  presented.  Its  adaptive  learning  aspects  are  noted. 

I.MTRODUCTION 

Figure  1  shows  an  overview  of  the  symbolic  neural  net.  A  multichannel  correlator  (Section  2) 
analyzes  the  input  scene.  It  is  unique  since  it  can  handle  multiple  objects.  It  provides  a  symbolic 
description  of  the  input  scene.  This  is  converted  to  a  position  encoded  input  neuron  description  that 
indicates  what  basic  elements  are  present  in  each  region  of  the  scene  (Section  3).  This  is  a  new 
neuron  representation  space.  No  prior  neural  net  can  accommodate  multiple  objects  in  the  field  of 
view  (FOV).  A  neural  net  production  system  (Section  4)  then  determines  the  contents  of  each  spatial 
region  of  the  scene.  Sections  5  and  6  present  initial  simulation  data  using  this  system. 


FIGURE  1 .  Optical  symbolic  neural  net. 


2.  SYMBOLIC  CORRELATOR 


-2- 


Figure  2  shows  a  4-channel  optical  symbolic  correlator  [1].  PI  contains  the  input  scene,  4 
frequency-multiplexed  filters  are  placed  at  P2  and  4  spatially  separated  correlation  planes  appear  at 
P3.  Each  correlation  plane  contains  peaks  at  the  locations  in  PI  of  the  4  different  filter  patterns  used 
at  P2  (thus  shift  invariance  exists  and  multiple  objects  in  the  FOV  can  be  handled).  Only  correlators 
provide  this  capacity  in  parallel.  The  4  correlation  plane  patterns  are  read  off  poi"t-by-point  in 
parallel  from  top  left  to  bottom  right.  A  segmented  CCD  can  achieve  this.  output  is  a  4-digit 
symbolic  word  that  describes  the  P2  filters’  response  for  each  spatial  region  of  the  scene  at  PI .  W-th 
4  binary  encoded  P2  filters  (correlation  peak  values  of  1  and  0),  we  can  describe  23 4  =  16  objects. 
With  F  filters  and  L  correlation  peak  levels,  we  can  encode  LF  =  104  =  10,000  classes  (this  is  an 
enormous  potential).  Multi-level  encoding  and  distortion- invariant  filters  are  possible  and  have  been 
previously  advanced  [2].  Our  production  system  neural  net  (Section  4)  analyzes  these  symbolic 
outputs  (for  each  spatial  region)  and  thus  we  achieve  the  first  shift- invariant  multiple  object  neural 
net.  Shift-invariance  is  achieved  by  use  of  a  correlator.  High  capacity  is  achieved  by  symbolic 
encoding.  Distortion-invariance  is  achieved  by  the  use  of  synthetic  discriminant  function  (SDF)  [2] 
filters.  Extensions  to  F  >  4  filters  (digits)  are  possible. 
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FIGURE  2.  Frequency-multiplexed  optical  symbolic  correlator  for 
neural  net  representation  generation. 


3.  NEURON  REPRESENTATION 

We  show  the  four  time  sequential  raster  outputs  from  the  4  correlation  planes  (Figure  2)  by  FI  to 
F4  in  Figure  3.  These  are  our  4-digit  descriptions  of  each  PI  region.  Our  symbolic  encoding  system 
converts  this  into  our  position-encoded  neuron  description  for  input  to  our  production  system  neural 
net  as  shown  conceptually  in  Figure  4.  In  the  specific  input  neuron  representation  we  consider,  each 
input  neuron  to  our  production  system  is  position  encoded  to  represent  a  generic  object  part. 

4.  NEURAL  NET  PRODUCTION  SYSTEM  CONCEPT 

A  production  system  consists  of  IF-THEN  statements.  Its  realization  is  possible  via  a  neural  net  or 
a  symbolic  substitution  system  [3].  Here,  we  consider  its  neural  net  realization.  For  our  present 
problem,  such  a  rule-based  system  is  necessary  to  determine  the  input  present  at  each  spatial  region 
of  PI  from  the  4-digit  symbolic  data  of  Figure  3  position-encoded  as  in  Figure  4.  We  describe  this 
concept  by  example.  Figure  5  shows  a  simple  set  of  4  rules  with  antecedents  on  the  left  and 
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FIGURE  3.  Symbolic  FI  to  F4  encoding  from  Figure  2  correlator. 
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FIGURE  4.  Neuron  representation  (position  encoding)  of  symbolic 
optical  correlator  FI  to  F4  outputs. 

consequents  on  the  right.  We  write  all  rules  as  IF-THEN  statements.  We  allow  the  AND  of  various 
antecedents  and  we  allow  (Figure  6)  the  OR  of  several  such  sets  of  antecedents.  The  antecedents 
(an)  are  facts  known  to  be  true.  The  output  consequents  (cn)  are  new  true  facts.  If  we  denote  each 
fact  (antecedent  or  consequent)  by  a  neuron  In  a  specific  position  (location),  then  the  rules  can  be 
described  as  weighted  combinations  of  input  neurons  (true  facts  are  input  or  output  neurons  that  are 
active  "1 "  and  false  facts  are  neurons  with  activation  "0").  Figure  7  shows  the  neural  net  that  realizes 
the  rules  in  Figure  5.  Figure  8  shows  a  standard  optical  matrix-vector  multiplier  that  realizes  the 
production  system  neural  net  of  Figure  7.  The  input  facts  (neurons)  are  represented  by  activated 
point  modulators  at  Pi  (LEDs,  laser  diodes,  etc.).  The  weights  (rules)  are  the  elements  of  an 
interconnection  matrix  at  P2  and  the  output  (antecendents)  facts  are  activated  detector  elements  at 
P3.  The  P2  weights  or  P3  thresholds  are  adjusted  to  produce  proper  outputs  [3].  The  diagonal 
elements  at  P2  are  one  so  that  input  facts  remain  true.  New  rules  not  present  in  the  original  rule  base 
can  be  inferred  and  operation  on  facts  with  various  degrees  of  confidence  are  possible  as  we  have 
detailed  [3]. 


5.  SIMULATION  RESULTS 

For  obstacle  avoidance  applications,  we  require  only  the  relative  size  of  the  objects  (holders  etc.) 
in  each  spatial  region  of  Pi.  For  navigation,  we  must  identify  what  exists  in  each  Pi  region  and  relate 
it  to  a  global  map  of  the  scene.  For  general  scene  analysis,  we  desire  to  know  what  is  present  (object 
name)  in  each  region  of  Pi . 
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5.1  Database 

To  demonstrate  all  aspects  of  such  a  neural  net,  we  consider  a  database  consisting  of  9  objects 
(Figure  9)  with  2  objects  shown  enlarged  for  clarity.  We  used  a  synthavision  solid  model  description 
in  which  each  object  consists  of  basic  general  shapes  or  parts.  Figure  1 0  shows  the  1 2  object  parts 
we  consider.  Thus,  the  version  of  the  general  problem  that  we  address  is  the  location  and  recognition 
of  each  object  part  in  the  input  scene  and  the  subsequent  identification  of  what  object  is  present  from 
this  object  part  information.  We  use  an  optical  correlator  (Figure  2)  to  locate  the  object  parts  present 
and  we  use  an  optical  neural  net  production  system  (Figure  8)  to  determine  what  object  is  present. 
Table  1  lists  the  9  objects  we  consider.  Table  2  lists  the  parts  we  used  to  synthesize  each.  Table  3 
lists  the  3  clusters  into  which  we  grouped  these  objects.  Table  4  lists  the  parts  we  use  within  each 
cluster  to  uniquely  describe  the  objects  within  that  cluster. 

We  formed  SDF  filters  for  each  object  part.  To  test  the  invariance  of  these  filters,  we  formed  3 
and  4  distorted  versions  of  each  object  and  of  each  object  part.  These  images  corresponded  to  0” , 
30  ”,  60  °  and  90  *  (or  variations  thereof  as  noted)  rotated  aspect  versions  of  the  original  (Figures  9 
and  1 0)  objects  and  parts.  We  formed  projection  SDFs  [2]  from  the  4  different  aspect  views  of  each 
object  part  and  tested  each  such  filter  against  the  test  set  of  distorted  objects. 


5.2  Distortion-Invariant  Part  Recognition 

The  results  in  Tables  5-7  show  that  the  object  parts  within  each  cluster  can  adequately 
discriminate  the  object  parts  used  and  that  they  can  be  recognized  invariant  to  distortions.  The 
underlined  entries  in  these  tables  of  data  denote  the  object  parts  denoted  as  being  present  in  the 
object  indicated  under  test  (with  proper  thresholds  used).  These  results  show  that  the  correlation 
filters  (one  per  part)  can  identify  all  tested  object  parts  independent  of  position  and  distortion.  The 
"tank"  object  was  a  poor  choice  and  was  not  used. 

6.  NEURAL  NET  PRODUCTION  SYSTEM 

We  used  the  results  of  Section  5  for  our  parts  list  and  our  object  tests,  and  our  production  system 
concept  (Section  4)  to  devise  a  parallel  (Figure  11)  and  an  iterative  (Figure  12)  neural  net  production 
system  design.  We  tested  these  neural  net  production  systems  on  our  database  with  successful 
results. 


Summary 

In  this  paper  we  have  unified  our  prior  symbolic  correlator  [1  ]  and  production  system  [3]  neural 
net  work  into  a  symbolic  neural  net.  It  provides  the  ability  to  handle  multiple  objects  in  the  field  of 
view.  It  is  shift-invariant,  distortion-invariant  and  has  the  potential  for  high  capacity.  Initial  simulation 
results  were  presented. 


Cfire  *  hydrant 


Cfence  - fence 


chouse  “  house 


big  car  (to  provide  detail) 


FIGURE  9.  The  9  objects  (and  scaled  versions  of  2)  in  our  initial  production  system. 


ahbar  “  horizontal  bar  ^fpost  *  short  P°st  adome  *  *°P  Pome  (fire  hydrant) 


Spost  *  long  thin  post  aname  *  name  plate  (street  sign)  a^x  «  box  (traffic  light) 


acarbod  -  car  bopy  aroof  =  root  ^ank  -  9as  tank  (motor  cycle) 


FIGURE  1 0.  The  1 2  parts  used  to  describe  as  facts  the  multiple  objects  (Figure  9)  in  our  database. 
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C 

c 

c 


fire  =  fire  hVdrant 
tii0m  = traffic  “flht 

tmckstruck 


Cfence  =  ,ence  =  street  lamp 

Csion  =  8treet  S'9n  Ccar  =  car 

Chouse  =  h0US®  Cmotorc  *  mot°r  C*C'e 

TABLE  1 :  Database  of  9  Objects 


Fire  hydrant: 

short  fat  post,  dome  on  top,  3  arms 

Fence: 

3  short  fat  posts,  2  horizontal  bars 

Traffic  light: 

long  thin  post,  box,  3  lights 

Street  lamp: 

long  thin  post,  light,  dome  on  top 

Street  sign: 

long  thin  post,  rectangular  name  plate 

Motor  cycle: 

2  wheels,  engine,  gas  tank,  seat,  pipes 

Car: 

car  body,  4  wheels,  wedge  shaped  car  top 

Truck: 

big  square  body,  8  wheels,  cabin 

House: 

big  square  body,  wedge  shaped  roof 

TABLE  2: 

Components  Used  to  Construct  Objects 

CLUSTER  1: 

SHORT  OBJECTS  (fire  hydrant,  fence) 

CLUSTER  2: 

TALL  OBJECTS  (traffic  light,  street  sign  and  lamp) 

CLUSTER  3: 

BIG  OBJECTS  (motor  cycle,  car,  house  and  truck) 

TABLE  3:  Multiple  Object  Clusters  Used  for  First  Separation  of  Objects 

CLUSTER- 1  PARTS: 

short  fat  post,  dome,  horizontal  bar 

CLUSTER-2  PARTS: 

long  thin  post,  box,  light,  rectangular  name  plate 

CLUSTER-3  PARTS: 

gas  tank,  car  body,  wheel,  big  body,  wedge  shaped  roof 

TABLE  4:  Symbolic  Parts  for  each  Object  Cluster 
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fence  0  fence  30  fence  60  fence  90  fire  0  fire  30  fire  60  fire  90 


hbar 

1.195 

1.184 

1.130 

0.808 

0.823 

0.687 

0.620 

0.738 

dome 

0.634 

0.678 

0.661 

0.492 

0.885 

0.885 

0.885 

0.885 

sfpost 

0.742 

0.742 

0.737 

0.871 

0.892 

0.868 

0.858 

0.946 

TABLE  5:  Cluster  1  Cross  Correlation  Data 


lamp 

tlight  0 

tlight  20 

tlight  40 

tlight  60 

sign  0 

sign  20 

sign  40 

sign  60 

light 

1.000 

0.386 

0.514 

0.475 

0.402 

0.440 

0.423 

0.394 

0.386 

tbox 

0.692 

1.063 

1.049 

1.046 

0.903 

0.682 

0.679 

0.686 

0.682 

name 

0.747 

0.624 

0.704 

0.670 

0.613 

0.995 

1.008 

1.069 

1.106 

post 

0.683 

1.000 

0.989 

0.989 

0.989 

0.667 

0.667 

0.667 

0.667 

TABLE  6:  Cluster  2  Cross  Correlation  Data 


car  0 

car  30 

car  60 

house  0 

house  30 

house  60 

bigbod 

0.355 

0.374 

0.312 

0.981 

1.063 

carbod 

1.097 

1.085 

1.020 

0.622 

0.578 

0.625 

roof 

0.306 

0.305 

0.289 

1.032 

1.002 

1.034 

tank 

0.981 

1.000 

0.944 

0.611 

0.667 

0.722 

wheel 

1.829 

1.868 

1.485 

0.780 

0.778 

0.793 

motorc  0 

motorc  30 

motorc  60 

truck  0 

truck  30 

truck  60 

bigbod 

0.232 

0.222 

0.179 

1.047 

1.070 

0.872 

carbod 

0.927 

0.720 

0.655 

0.750 

0.731 

0.695 

roof 

0.240 

0.241 

0.212 

0.564 

0.521 

0.462 

tank 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

wheel 

1.962 

1.995 

1.943 

1.827 

1.810 

1.586 

TABLE  7:  Cluster  3  Cross  Correlation  Data 


SFPOST 

DONE 

HBAR 


FRE 

FENCE 


IF 

asfpOSt 

AND  adome 

THEN  cfire 

POST 

IF 

asfpost 

AND  ahbar 

THEN  'fence 

LIGHT 

IF 

a 

post 

AND  alight 

™EN  clainP 

NAME 

IF 

a 

post 

AND  aname 

™EN  'sign 

TBOX 

IF 

a 

post 

AND  atbox 

™EN  flight 

WVfR 

IF 

awheel(s) 

AND  atank 

™EN  cmotorc 

TANK 

IF 

awheel(s) 

AND  acarbod 

THEN  c 

car 

CAR0OO 

IF 

awheel(s) 

AND  abigbod 

THEN  'truck 

E9G0OO 

IF 

aroof 

AND  abigbod 

THEN  'house 

foof 

LAMP 

SIGN 

TUGHT 


(a) 


(b) 


Figure  1 1 .  Rule  base  (a)  and  neural  net  (b)  for  a  parallel  production  system. 


^  a«fpost 

THEN  cfire  or  fence 

^  *rire  or  fence 

AM)  a . 

dome 

THEN  c(jr< 

^  afire  or  fence 

AND  a., 

Iifoar 

THEN  Cfcc, 

,F'post 

THEN  c.  ..  . 

lamp,  sign  or  ihghi 

^F  'lamp,  sign  or  lliglil 

AN1)  'light 

THENCump 

^  'lamp,  sign  or  llight 

AND'tbo* 

THEN 'Uight 

^  'lamp,  sign  or  Uight 

AND  a 

name 

THEN'Ugn 

IF  'wheels  close 

THEN  cnK)torc  or  truck 

*F  'wheels  far  apart 

THEN^orUuck 

,F  *bigbod 

THEN  'house  or  truck 

^F  'motorc  or  truck 

AND‘bigbod 

™EN  'truck 

*F  'motorc  or  truck 

AND'unk 

THEN 'motorc 

^F  'car  or  truck 

AND'carbod 

THEN 

*F  'house  or  truck 

AND'roof 

THEN 'house 

(a) 


Figure  1 2.  Rule  base  (a)  and  neural  net  (b)  for  an  iterative  production  system. 
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ABSTRACT 

An  optical  symbolic  neural  net  is  described.  It  uses  an  optical  symbolic  correlator.  This  produces 
a  new  input  neuron  representation  space  that  is  shift-invariant  and  can  accommodate  multiple 
objects.  No  other  neural  net  can  handle  multiple  objects  within  the  field  of  view.  Initial  optical 
laboratory  data  are  presented.  An  optical  neural  net  production  system  processes  this  new  neuron 
data.  This  aspect  of  the  system  is  briefly  described. 

1.  INTRODUCTION 


Figure  1  shows  an  overview  of  the  symbolic  neural  net.  A  multichannel  correlator  (Section  2) 
analyzes  the  input  scene.  It  is  unique  since  it  can  handle  multiple  objects.  It  provides  a  symbolic 
description  of  the  input  scene.  This  is  converted  to  a  position  encoded  input  neuron  description  that 
indicates  what  basic  elements  are  present  in  each  region  of  the  scene  (Section  3).  This  is  a  new 
neuron  representation  space.  No  prior  neural  net  can  accommodate  multiple  objects  in  the  field  of 
view  (FOV).  A  neural  net  production  system  (Section  4)  then  determines  the  contents  of  each  spatial 
region  of  the  scene.  Sections  5  and  6  present  initial  optical  data  using  this  system. 


FIGURE  1.  Optical  symbolic  neural  net. 


2.  SYMBOLIC  CORRELATOR 


Figure  2  shows  a  4-channel  optical  symbolic  correlator  [1].  Pi  contains  the  input  scene,  4 
frequency-multiplexed  filters  are  placed  at  P 2  and  4  spatially  separated  correlation  planes  appear  at 
P3.  Each  correlation  plane  contains  peaks  at  the  locations  in  PI  of  the  4  different  filter  patterns  used 


at  P2  (thus  shift  invariance  exists  and  multiple  objects  in  the  FOV  can  be  handled).  Only  correlators 
provide  this  capacity  in  parallel.  The  4  correlation  plane  patterns  are  read  off  point-by-point  in 
parallel  from  top  left  to  bottom  right.  A  segmented  CCD  can  achieve  this.  The  output  is  a  4-digit 
symbolic  word  that  describes  the  P2  filters'  response  for  each  spatial  region  of  the  scene  at  PI .  With 
4  binary  encoded  P2  filters  (correlation  peak  values  of  1  and  0),  we  can  describe  23 4  =  16  objects. 
With  F  filters  and  L  correlation  peak  levels,  we  can  encode  LF  =  104  =  10,000  classes  (this  is  an 
enormous  potential).  Multi-level  encoding  and  distortion-invariant  filters  are  possible  and  have  been 
previously  advanced  [2].  Our  production  system  neural  net  (Section  4)  analyzes  these  symbolic 
outputs  (for  each  spatial  region)  and  thus  we  achieve  the  first  shift-invariant  multiple  object  neural 
net.  Shift-invariance  is  achieved  by  use  of  a  correlator.  High  capacity  is  achieved  by  symbolic 
encoding.  Distortion-invariance  is  achieved  by  the  use  of  synthetic  discriminant  function  (SDF)  [2] 
filters.  Extensions  to  F  >  4  filters  (digits)  are  possible.  Section  4  provides  initial  optical  laboratory 
results  obtained  with  single  and  multiplexed  filters. 

LI  P2 

\J 

*nPul  Multiple 

frequency- 
multiplexed 
filters 

FIGURE  2.  Frequency-multiplexed  optical  symbolic  correlator  tor 
neural  net  representation  generation. 


3.  NEURON  REPRESENTATION 

We  denote  the  time  sequential  raster  outputs  from  the  4  correlation  planes  (Figure  2)  by  FI  to  F4 
m  Figure  3.  These  are  our  4-digit  descriptions  of  each  Pi  region.  They  are  obtained  in  parallel  for 
each  P,  region.  Our  symbolic  encoding  system  converts  this  into  our  position-encoded  neuron 
description  for  input  to  our  production  system  neural  net  as  shown  conceptually  in  Figure  4.  In  the 
specific  input  neuron  representation  we  consider,  each  input  neuron  to  our  production  system  is 
position  encoded  to  represent  a  generic  object  part. 

4.  NEURAL  NET  PRODUCTION  SYSTEM  CONCEPT 

A  production  system  consists  of  IF-THEN  statements.  Its  realization  is  possible  via  a  neural  net  or 
a  symbolic  substitution  system  [3].  Here,  we  consider  its  neural  net  realization.  For  our  present 
problem,  such  a  rule-based  system  is  necessary  to  determine  the  input  present  at  each  spatial  region 
of  PI  from  the  4-digit  symbolic  data  of  Figure  3  position-encoded  as  in  Figure  4.  We  describe  this 
concept  by  example.  Figure  5  shows  a  simple  set  of  4  rules  with  antecedents  on  the  left  and 
consequents  on  the  right.  We  write  all  rules  as  IF-THEN  statements.  We  allow  the  AND  of  various 


L2  P3 


Output 

correlation 

planes 


FI  - 
F2  — ♦ 
F3  — 
F4  — * 


— ►  4-digit  symbolic  description  of  each  spatial  region  of  PI  scene 


FIGURE  3.  Symbolic  FI  to  F4  encoding  from  Figure  2  correlator. 
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FIGURE  4.  Neuron  representation  {position  encoding)  of  symbolic 
optical  correlator  FI  to  F4  outputs. 


antecedents  and  we  allow  (Figure  6)  the  OR  of  several  such  sets  of  antecedents.  The  antecedents 
(an)  are  facts  known  to  be  true.  The  output  consequents  (cn)  are  new  true  facts.  If  we  denote  each 
fact  (antecedent  or  consequent)  by  a  neuron  in  a  specific  position  (location),  then  the  rules  can  be 
described  as  weighted  combinations  of  input  neurons  (true  facts  are  input  or  output  neurons  that  are 
active  "1 "  and  false  facts  are  neurons  with  activation  ”0").  Figure  7  shows  the  neural  net  that  realizes 
the  rules  in  Figure  5.  Figure  8  shows  a  standard  optical  matrix-vector  multiplier  that  realizes  the 
production  system  neural  net  of  Figure  7.  The  input  facts  (neurons)  are  represented  by  activated 
point  modulators  at  Pi  (LEDs,  laser  diodes,  etc.).  The  weights  (rules)  are  the  elements  of  an 
interconnection  matrix  at  P2  and  the  output  (antecendents)  facts  are  activated  detector  elements  at 
P3.  The  P2  weights  or  P3  thresholds  are  adjusted  to  produce  proper  outputs  [3].  The  diagonal 
elements  at  P2  are  one  so  that  input  facts  remain  true.  New  rules  not  present  in  the  original  rule  base 
can  be  inferred  and  operation  on  facts  with  various  degrees  of  confidence  are  possible  as  we  have 
detailed  [3]. 


5.  OPTICAL  LABORATORY  RESULTS 


For  obstacle  avoidance  applications,  we  require  only  the  relative  size  of  the  objects  (holders  etc.) 
in  each  spatial  region  of  Pi .  For  navigation,  we  must  identify  what  exists  in  each  PI  region  and  relate 
it  to  a  global  map  of  the  scene.  For  general  scene  analysis,  we  desire  to  know  what  is  present  (object 
name)  in  each  region  of  PI . 


5.1  Database 


To  demonstrate  this  concept,  we  considered  a  database  of  9  objects  (Table  1).  Each  was 


*1  a2  »3  a4 

a  -►  b 

Q  and  C  and  f  — ►  Q 

b-^  a 

f  «o  g-^  c 

FK5LWE  5.  Simple  if-then  production  rules.  FIGURE  6.  Production  system  with  the  OR 

of  several  paths  to  the  same  consequent  cn. 


outputs  (control  signals  or  just  teedbacX) 


inputs  (from  ssnsos.  corrsUtors  or  loopback) 


FIGURE  7.  Neural  net  for  the  rules  in  Figure  5. 

Li  P2  L2  P3 


FIGURE  8.  Optical  neural  net. 


produced  from  generic  shapes  (object  parts)  using  a  Synthavision  system  including  lighting  and 
illumination  effects.  We  divided  the  9  objects  into  3  clusters  (Table  2)  with  various  generic  shapes 
(object  parts)  comprising  each  (Table  3).  For  each  object  and  each  of  the  12  generic  shapes,  we 
generated  from  3  to  5  different  aspect  views.  Distortion-invariant  projection  SDF  filters  [2]  were 
formed  for  each  of  the  12  shapes  and  tested  against  the  distorted  objects  to  verify  that  we  could 
recognize  each  of  the  12  shapes  independent  of  distortions.  The  shapes  within  the  test  objects 
differed  from  those  from  which  the  filters  were  formed  due  to  shading  and  illumination,  edge 
thicknesses,  and  occlusion  (e  g.  the  rods  and  posts  on  the  fences,  the  wheels  on  the  vehicles,  etc.). 
Initial  simulation  results  [4]  were  successful.  In  simulation  tests,  we  showed  success  in  recognizing 
distorted  parts  and  discriminating  between  parts  in  objects.  We  now  consider  optical  laboratory 
results  obtained  with  a  correlator. 


cfjre  =  fire  hydrant 

ctti0ht  =  traffic  lj9ht 
c,ruck  -  trUCk 

Cfence  =  fenCe  Clamp  =  Street  lamP 

Csign  =  street  si9n  Ccar  =  car 

Chouse  =  h0USe  Cmotorc  8  m0t0r  Cyde 

TABLE  1 :  Database  of  9  Objects 

CLUSTER  1 : 
CLUSTER  2: 
CLUSTER  3: 

SHORT  OBJECTS  (fire  hydrant,  fence) 

TALL  OBJECTS  (traffic  light,  street  sign  and  lamp) 

BIG  OBJECTS  (motor  cycle,  car,  house  and  truck) 

TABLE  2:  Multiple  Object  Clusters  Used  for  First  Separation  of  Objects 


CLUSTER -1  PARTS:  short  fat  post,  dome,  horizontal  bar 

CLUSTER-2  PARTS:  long  thin  post,  box,  light,  rectangular  name  plate 

CLUSTER-3  PARTS:  gas  tank,  car  body,  wheel,  big  body,  wedge  shaped  roof 


TABLE  3:  Symbolic  Parts  for  each  Object  Cluster 


5.2  Fitter  Synthesis  and  Testing 

Projection  SDFs  [2]  were  formed  with  no  false  class  training  images  used  and  with  true  class 
peaks  set  to  1.0.  When  a  part  was  aspect  view  symmetric  (e.g.  the  short  fat  post,  the  dome,  the  post 
and  light),  only  one  training  image  was  used  (otherwise  the  matrix  used  in  synthesis  is  singular).  For 
objects  with  more  than  one  occurrence  of  one  part,  only  one  was  used.  The  parts  used  were  isolated 
and  one  fixed  illumination  was  used  for  all  aspect  views  of  it.  A  3x3  Sobel  was  used  to  edge 
enhance  the  data,  which  was  then  binarized  to  produce  edges  thicker  than  2  pixels.  In  tests,  different 
illuminations  and  edge  widths  plus  occlusions  occurred  and  when  multiple  parts  are  present  in  an 


object,  each  has  a  different  shape  and  edge  width.  In  addition,  in  training  the  parts  are  rotated  about 
their  geometric  center,  while  in  testing  the  objects  are  rotated  about  a  different  point.  Thus, 
considerable  differences  exist  in  the  training  and  test  data. 


5.3  Advanced  Considerations 


When  multiple  parts  are  present  in  an  object,  the  number  of  parts  and  their  locations  are  useful 
features.  All  such  parts  cannot  be  located  without  reducing  the  threshold  and  allowing  false  alarms. 
The  NN  could  possibly  solve  such  problems.  Increasing  the  number  of  training  images  to  include  the 
different  versions  of  a  part  in  each  object  would  help,  so  would  including  false  class  training  images. 
Another  issue  is  the  presence  of  several  correlation  peaks  above  threshold  around  a  central  peak 
(due  to  the  various  lines  in  the  parts  and  object.  Blob  coloring  allows  us  to  select  only  the  true 
central  peak  and  will  aid  in  locating  multiple  parts. 


5.4  Resolution 


We  reduced  the  resolution  for  the  street  lamp  and  traffic  light  objects  and  the  light  part  (in  the 
street  lamp)  and  the  tbox  part  (in  the  traffic  light)  from  256x256  down  to  32x32.  Table  4  lists  the 
autocorrelation  peak  (it  is  underlined)  for  the  different  inputs  (horizontal)  with  the  2  different  filters 
(vertical).  All  intra-image  peaks  (within  the  same  true  correlation  plane  with  the  part  present  in  the 
object)  were  less  than  the  autocorrelation  peak  value.  The  largest  inter-image  crosscorrelation  peak 
value  is  given  in  the  table  (this  is  the  largest  peak  anywhere  when  the  input  object  does  not  contain 
the  part).  Table  4  shows  Pc  =  100%  discrimination  is  possible  with  64x64  resolution.  The  minimum 
true  peak  values  remain  about  the  same  as  resolution  decreases,  while  the  maximum  false  peak  value 
increases.  This  is  expected,  since  with  reduced  resolution  objects  look  more  similar  and  we  expect 
both  true  and  false  peaks  to  tend  to  the  same  value. 


5.5  Filter  Quantization 

We  also  quantized  the  number  of  amplitude  levels  in  the  image  version  of  the  filer.  This  is  a 
practical  issue  in  an  optical  realization.  Table  5  shows  selected  results.  No  significant  degradation  in 
true  or  false  peak  values  occurred  down  to  8  levels.  The  auto  and  crosscorrelation  peak  values  with 
MSFs  of  the  light  and  post  are  not  affected  as  they  use  only  one  binary  training  image.  As  few  as  4 
levels  can  thus  be  used  in  encoding  the  filters. 


5.6  Optical  Laboratory  Single  Filters 

All  optical  laboratory  tests  used  a  standard  VanderLugt  correlator  with  film  input  and  the  MSFs 
recorded  on  an  NRC  thermoplastic  camera.  The  input  FT  lens  L,  had  fL  =  495  mm,  the  FT  lens  L2  had  f 
*  400  mm  and  the  correlation  was  determined  with  a  camera.  The  reference  beam  angle  used  was  6  = 
30* ,  corresponding  to  a  spatial  frequency  o  =  (sin5)/x  *  8x  106cy/cm  with  He-Ne  light  (the  center  of 
the  thermoplastic  camera's  bandpass  response).  The  bias  exposure  for  the  thermoplastic  camera 
was  20  fiJ.  The  K  ratio  was  4-10  measured  over  the  full  reference-to-signa!  beams. 


lamp 

0° 

0° 

tlight 

20°  40° 

60° 

256  x  5 

156  images  and  SDFs 

light 

1.000 

0.386 

0.514 

0.475 

0.402 

tbox 

1.063 

1.049 

1.046 

0.903 

128  x  128  images  and  SDFs 

light 

1.000 

0.482 

0.729 

0.541 

0.541 

tbox 

0.701 

1.072 

1.063 

1.012 

0.978 

64  x  64  images  and  SDFs 

light 

1.000 

0.633 

0.800 

0.633 

0.800 

tbox 

0.742 

1.016 

1.012 

1.033 

1.021 

32  x  32  images  and  SDFs 

light 

1.000 

1.000 

1.000 

1.000 

1.000 

tbox 

0.875 

1.000 

1.000 

1.000 

0.938 

TABLE  4:  Cluster  2  auto-  and  inter-image 
correlation  peak  values  in  spatial  resolution 
experiments. 


:  lamp  ,  sign  j  tlight 


j 

0* 

0< 

20* 

40* 

GO* 

o- 

20* 

40* 

60* 

Continuous  levels 

j  n«me 

0.747 

0  993 

1.009 

1222 

1122 

0.025 

0.704 

0.670 

0.G13 

i  iigto 

1  nno 

0.440 

0  425 

0.394 

0.3*0 

0.3% 

0.514 

0  475 

0.402 

j  tbox 

0.692 

0  <i°2 

0.C79 

0  6% 

0.6F2 

1222 

3.04Q 

1  046 

0  903 

I  post 

0  <••‘■3 

0  667 

0  667 

0  667 

0607 

i  noo 

0  9% 

0.9% 

0.9% 

256  levels 

Lame 

0.747 

0  999 

laoa 

1222 

U21- 

0.023 

0.704 

0.C71 

0.012 

light 

1.000 

0.4-10 

0.425 

0.394 

0.3% 

0.3% 

0.534 

0.475 

0  402 

tbox 

0.090 

0.6  SO 

0.C7G 

0.663 

0.6S0 

1222 

1  045 

1 .044 

0%<- 

post 

0*c3 

0  60“ 

0.667 

2407 

JL22I 

1.000 

09% 

09% 

0-9% 

A  levels 

name 

0.s0^ 

1122 

1221 

3  155 

1212 

0.666 

0.751 

0.734 

light 

]  .nno 

0.425 

0.3% 

0.366 

0.514 

0.475 

0.402 

tbox 

0691 

0.715 

12212 

1222 

1.036 

0976 

J.OSi 

0  693 

0.6G7 

0.6G7 

0.667 

0.9<tQ 

2  levels 

nauiie 

0.7*4 

0122 

0902 

Q922 

1  542 

0.757 

0.611 

o 

■f 

o 

0.722 

light 

1222 

0.440 

0.425 

0.394 

0.366 

0.3% 

0.514 

0475 

0.402 

tbox 

0.652 

0.596 

0.621 

0.621 

0.563 

1122 

1.022 

0924 

Qi72 

po<l 

Oovj 

0  067 

0607 

0  66,7 

0007 

1000 

Q.oaQ 

09% 

0.9*9 

TABLE  5:  Cluster  2  auto-  and  inter-image 
correlation  peak  values  in  amplitude 
quantization  experiments. 


Table  6  shows  optical  laboratory  results  obtained  for  different  SDF  filters  versus  different  input 
objects.  The  true  peak  values  are  for  autocorrelation  peaks.  The  largest  false  peak  value  anywhere 
is  listed  when  a  false  class  input  is  present.  Tests  (a)  show  that  the  post  can  be  determined  in  all 
traffic  light  inputs.  The  post  is  also  present  in  the  sign  and  lamp  objects  and  its  peak  values  with 
these  inputs  are  similar.  The  tbox  in  test  (b)  is  only  present  in  the  tlight.  It  is  recognized  in  all  tlight 
inputs.  It  is  not  present  in  the  sign  and  lamp  objects  and  these  peaks  are  all  below  the  lowest  true 
peak.  Tests  (c)  and  (d)  show  that  multiple  parts  can  also  be  recognized  (in  some  object  views,  where 
expected). 


Figure  9  shows  various  laboratory  results.  In  Figure  9a,  the  input  was  the  0‘  light  and  the  post 
filter  was  used.  The  3  correlation  peaks  for  the  sfpost  filter  and  the  0 '  fence  are  shown  (Figure  9b) 
as  are  the  four  correlation  peaks  of  the  wheel  and  the  0*  truck.  The  values  in  parentheses  indicate 
the  peak  value  (or  their  range  for  the  case  when  multiple  parts  are  present). 


5.7  Optical  Laboratory  Frequency-Multiplexed  Filter  Tests 


A  frequency-multiplexed  optical  laboratory  system  was  assembled.  Four  frequency-multiplexed 
SDFs  were  fabricated.  Figure  10  shows  the  images  of  the  4  SDFs  (two  use  only  one  reference 
pattern).  These  4  SDFs  were  placed  side-by-side  in  the  input  with  4  mm  between  each.  The  full  input 


(a)  POST  FILTER 


(b)  TBOX  FILTER 


_ 

_ 

_ —  _  _  _ 

INPUT 

TRUE 

PEAK 

TRUE 

INPUT 

TRUE 

PEAK 

FALSE 

INPUTS 

LARGEST 

PEAK 

tlight  O' 

121 

tlight  0' 

237 

Sign  0' 

115 

tlight  20' 

148 

tlight  20' 

227 

Sign  20' 

120 

tlight  40' 

109 

tlight  40' 

160 

Sign  40' 

143 

tlight  60" 

102 

tlight  60' 

158 

Sign  60' 

120 

Lamp 

155 

(c)  SF-POST  FILTER 
vs.  0 '  FENCE  INPUT 

(d)  WHEEL  FILTER 
vs.  0 '  TRUCK  INPUT 

POST 

TRUE 

PEAK 

WHEEL 

TRUE 

PEAK 

1 

111 

1 

234 

2 

96 

2 

255 

3 

103 

3 

216 

4 

197 

TABLE  6:  Single  Filter  Optical  Laboratory  Test  Results 

was  14  mm  wide.  The  frequency-multiplexed  filter  was  formed  with  one  exposure  of  this  input.  The 
output  contains  4  correlation  planes  (left  to  right)  of  the  input  test  object  with  filters  of  the  post,  tbox, 
light  and  name  plate  respectively.  Figure  1 1  shows  typical  results  obtained  with  3  different  test 
inputs. 


5.6  Optical  Laboratory  CGH  Results 

Optical  filters  were  also  synthesized  with  a  new  CGH  encoding  technique  [5].  The  results  are 
summarized  in  Table  7.  The  light  part  is  present  only  in  the  lamp  and  the  tbox  part  is  present  only  in 
the  tlight.  They  show  much  better  agreement  with  simulations,  since  the  CGH  encoding  used  is  very 
accurate. 


(a)  flight  input  (contains  post  and  tbox) 


\ 


(b)  lamp  input  (contains  post  and  light) 


4 

i 


(c)  sign  input  (contains  post  and  name  piate) 


FIGURE  1 1 .  Optical  outputs  of  frequency-multiplexed  filters  of  (left  to  right) 
the  post,  tbox,  fight,  and  name  plate. 


INPUT 

lamp  0  * 

tlight  0 ' 

tlight  20  • 

tlight  40* 

tlight  60* 

auto 

light 

filter 

64 

41 

51 

41 

51 

64 

tbox 

filter 

45 

62 

61.5 

63 

62 

61 

TABLE  7:  Optical  Laboratory  Results  using  New  CGH  Encoded  Filters  (64x64  images). 
The  maximum  false  class  peak  anywhere  and  the  true  peak  (underlined)  are  shown. 


6.  NEURAL  NET  PRODUCTION  SYSTEM 

We  used  the  results  of  Section  5  for  our  parts  list  and  our  object  tests,  and  our  production  system 
concept  (Section  4)  to  devise  a  parallel  (Figure  12)  and  an  iterative  (Figure  13)  neural  net  production 
system  design.  We  tested  these  neural  net  production  systems  on  our  database  with  successful 
results. 
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Figure  1 2.  Rule  base  (a)  and  neural  net  (b)  for  a  parallel  production  system. 
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Figure  13.  Rule  base  (a)  and  neural  net  (b)  for  an  iterative  production  system. 
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CHAPTER  9 

"Optical  Production  Systems  Using 
Neural  Networks  and  Symbolic  Substitution" 


Reprinted  from  Applied  Optic*,  Vol.  27,  page  5185,  December  15,  IM8 
Copyright  ©  1988  by  the  Optical  Society  of  America  and  reprinted  by  permission  of  the  copyright  owner. 


Optical  production  systems  using  neural  networks  and 
symbolic  substitution 


Elizabeth  Botha,  David  Casasent,  and  Etienne  Barnard 


Two  optical  implementations  of  production  systems  are  advanced.  The  production  systems  operate  on  a 
knowledge  base  where  facts  and  rules  are  encoded  as  formulas  in  propositional  calculus.  The  first  implemen¬ 
tation  is  a  binary  neural  network.  An  analog  neural  network  is  used  to  include  reasoning  with  uncertainties. 
The  second  implementation  uses  a  new  optical  symbolic  substitution  correlator.  This  implementation  is 
useful  when  a  set  of  similar  situations  has  to  be  handled  in  parallel  on  one  processor. 


I.  Introduction 

Several  attempts  have  been  made  to  apply  optics  to 
the  field  of  symbolic  computation.  In  one  case1  a 
hybrid  optical  and  electronic  architecture  was  pro¬ 
posed  to  implement  a  subset  of  the  language  PROLOG. 
The  time-consuming  parts  of  the  PROLOG  implemen¬ 
tation  are  done  optically,  but  extensive  electronic  con¬ 
trol  is  employed  and  multiplexed  filters  which  we  use 
were  not  employed.  This  system  is  query-  or  goal- 
driven,  while  our  production  system  is  data-driven. 
This  is  not  the  general  inference  machine  (production 
system)  we  consider,  but  serves  as  a  coprocessor  to  a 
general-purpose  electronic  computer.  An  optical  in¬ 
ference  machine  has  been  suggested2  in  which  differ¬ 
ent  situations  are  represented  as  adjacency  matrices 
(these  concepts  arise  from  directed  graph  theory)  and 
inferences  are  made  by  Boolean  matrix  addition  and 
multiplication.  This  system  can  handle  only  the  sim¬ 
plest  case  of  Boolean  logic,  namely,  a  -~b  (c  implies  b) 
and  its  extension  to  handle  subsequent  inferences 
(which  we  consider)  is  not  addressed.  Architectures 
to  extend  the  storage  capacity  (the  size  of  a  fixed 
knowledge  base)  of  such  systems  using  multiple  holo¬ 
grams3  have  been  described.  Another  production  sys¬ 
tem4  calculates  probabilities  associated  with  different 
inferences  and  assumes  that  the  assertions  in  the 
knowledge  base  are  independent  events  and  that  a 
linear  relationship  (a  matrix)  exists  between  the  asser- 
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tions  (the  input  vectors)  and  the  consequents  (output 
vectors).  This  allows  inferences  to  be  made  by  a  single 
matrix-vector  multiplication  followed  by  threshold¬ 
ing.  Another  similar  expert  or  production  system5 
allows  the  knowledge  base  to  be  updated  during  learn¬ 
ing  and  uses  the  Bayes  theorem  to  update  the  probabil¬ 
ities  of  consequents  and  rules  and  again  uses  optical 
matrix-vector  multiplication  to  make  inferences.  We 
find  the  assumptions  of  independent  events  or  asser¬ 
tions  (which  allows  multiplication  of  probabilities 
when  using  the  Bayes  decision  theorem)  and  mutually 
exclusive  events  (allowing  the  addition  of  probabili¬ 
ties)  unrealistic  in  the  situations  where  these  systems 
are  used  (such  as  in  medical  diagnosis  and  vehicle 
control). 

In  this  paper  we  show  how  we  can  use  optics  to 
implement  a  production  system  using  two  different 
approaches:  the  first  employs  an  artificial  neural  net¬ 
work  structure  and  the  second  is  based  on  symbolic 
substitution.  The  neural  network  architecture  em¬ 
ployed  is  the  well-known  matrix-vector  multiplication 
followed  by  a  nonlinear  function  and  feedback.  Neu¬ 
ral  network  architectures  have  not  explicitly  been  used 
as  iterative  inference  machines.  However,  Anderson6 
formulated  an  autoassociative  memory  in  which  each 
of  the  inference  rules  (if-then-rules)  is  encoded  as  one 
key/recall  vector,  with  these  vectors  stored  in  an  asso¬ 
ciative  memory  using  the  conventional  outer-product 
formulation.  When  this  system  is  presented  with  a 
partial  input  vector  (only  the  antecedent  elements),  it 
iterates  until  the  full  vector  is  reconstructed  (i.e.,  the 
consequent  part  is  obtained).  This  system  differs 
from  ours,  since  it  iterates  to  invoke  one  rule  and  no 
subsequent  rules  are  examined.  Thus,  our  use  of  a 
matrix-vector  neural  network  as  an  iterative  inference 
engine  is  new.  The  matrix  and  vector  can  be  parti¬ 
tioned  into  facts  and  rules  for  different  nearly  uncou- 
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pled  situations  associated  with  a  general  problem.  In 
this  sense,  the  system  is  modular.  This  aids  learning, 
since  the  system  need  not  know  and  thus  make  all 
inferences  in  one  pass  through  the  system  (as  is  re¬ 
quired  in  a  noniterative  system).  Rather,  some  infer¬ 
ences  are  made  in  each  cycle  and  then  other  inferences 
are  made  on  subsequent  cycles  using  recently  substan¬ 
tiated  inferences.  The  nonlinearity  present  allows  the 
system  to  model  interdependent  rules  (that  depend  on 
several  input  facts;  e.g.,  if  two  wheels  and  a  rectangle 
are  present,  the  object  is  a  car;  if  two  wheels  and  a 
person  are  present,  it  is  a  bicycle).  In  the  prior  exam¬ 
ples,  there  is  a  large  increase  in  confidence  when  the 
evidence  that  a  rectangle  or  a  person  is  present  is 
included  and  this  is  modeled  by  the  nonlinearity  in  the 
system.  The  nonlinearity  thus  allows  the  system  to 
handle  such  situations,  where  the  combination  of  the 
initial  pieces  of  evidence  is  greater  than  the  sum  of 
their  individual  contributions. 

Both  of  the  systems  we  consider  use  a  local  represen¬ 
tation  of  facts  (one  pixel  per  fact)  and  are  thus  less 
fault  tolerant  than  a  distributed  representation  (each 
fact  is  distributed  to  several  inputs),  since  the  loss  of 
one  input  pixel  will  completely  eliminate  one  fact.  W e 
also  only  consider  the  optical  realization  of  a  fixed  set 
of  rules  that  are  known  a  priori.  This  allows  the  use  of 
fixed  filters  and  interconnection  masks,  which  is  pres¬ 
ently  more  realistic  than  optical  systems  requiring 
adaptive  filters  and  interconnections.  However,  our 
system  allows  new  rules  to  be  inferred  that  are  not 
explicitly  encoded. 

In  Sec.  II  we  define  our  terms  and  the  initial  scenario 
addressed.  In  Sec.  Ill  we  describe  a  binary  neural 
architecture  that  can  handle  only  binary  (yes/no)  type 
decisions  and  an  analog  neural  system  that  includes 
the  calculation  of  probabilities  (without  assuming  mu¬ 
tually  exclusive  events  and/or  independent  asser¬ 
tions).  Section  IV  describes  a  symbolic  substitution 
production  system  and  a  scenario  where  several  similar 
parallel  situations  occur.  In  this  case,  a  symbolic  sub¬ 
stitution  implementation  is  preferable.  We  conclude 
with  a  discussion  in  Sec.  V. 

II.  Definition  of  Terms  and  Scenario 

In  artificial  intelligence,  a  production  system  con¬ 
sists  of  data,  operations,  and  control."  In  real  systems, 
these  distinctions  often  become  fuzzy.  In  the  produc¬ 
tion  system  we  consider,  the  data  are  referred  to  as  a 
knowledge  base,  consisting  of  a  set  of  facts  and  a  set  of 
inference  rules  (also  called  production  rules),  that  con¬ 
stitutes  the  operations  performed  on  the  facts.  The 
system  control  handles  the  order  in  which  the  rules  are 
invoked.  In  a  data-driven  system,  the  control  selects 
the  rules  to  be  tested.  These  are  the  rules  associated 
with  present  facts/data.  This  process  continues  until 
the  goals  are  satisfied.  In  a  goal-driven  system,  the 
control  first  selects  the  rules  associated  with  the  goals 
and  proceeds  with  the  inferences  to  arrive  at  the 
present  facts/data  (if  possible).  In  the  applications  we 
have  in  mind,  we  refer  to  the  production  system  as  an 
inference  engine,  an  expert  system,  or  a  rule-based 


system,  without  any  fine  discrimination  between  these 
terms. 

We  choose  propositional  calculus  as  the  formalism 
in  which  to  represent  the  knowledge  base  of  the  pro¬ 
duction  system.  Propositional  calculus  is  a  subset  of 
predicate  calculus,  which  is  a  formal  language.  The 
elementary  components  of  predicate  calculus7  are 
symbols  that  are  combined  to  form  what  are  called 
atomic  formulas.  Sentences  or  expressions  in  predi¬ 
cate  calculus  consist  of  connected  atomic  formulas  and 
are  called  well-formed  formulas  (wffs).  In  predicate 
calculus,  expressions  can  contain  variable  (among  oth¬ 
er)  symbols,  four  connectives,  and  quantification  is 
allowed.  Propositional  calculus  allows  the  use  of  the 
same  four  connectives  between  atomic  formulas  as 
predicate  calculus,  namely,  and  denoted  by  A  (which 
forms  a  conjunction),  OR  denoted  by  V  (forming  a 
disjunction),  IMPLIES  denoted  by  -♦  (which  forms  an 
implication),  and  NOT  denoted  by  ~  (forming  a  nega¬ 
tion).  The  connection  (i.e.,  the  result  of  applying  any 
of  the  connectives:  conjunction,  disjunction,  nega¬ 
tion,  or  implication)  of  any  number  of  wffs  is  also  a  wff. 
In  an  implication,  the  left-hand  side  is  called  the  ante¬ 
cedent  and  the  right-hand  side  the  consequent. 
Atomic  formulas  in  propositional  calculus  contain  only 
predicate,  function,  and  constant  symbols  (the  use  of 
variable  symbols  such  as  x  is  precluded).  To  illustrate 
the  meaning  of  these  symbols,  consider  the  fact  “Dave’s 
student  wrote  a  paper.”  This  sentence  would  be  rep¬ 
resented  as  the  atomic  formula  WROTE  [PAPER,  stu- 
dent(DAVE)].  “wrote”  is  the  predicate  symbol  that 
represents  a  relationship;  “paper”  and  “dave”  are 
constant  symbols  and  represent  objects  or  entities; 
“student”  is  a  function  symbol  that  maps  constant 
objects  or  entities  to  one  another.  Propositional  cal¬ 
culus  was  chosen  rather  than  predicate  calculus  since 
its  operations  can  be  realized  more  easily  on  the  archi¬ 
tectures  we  consider.  We  will  refer  to  the  atomic 
formulas  as  “facts”  and  the  implications  (containing 
any  of  the  symbols  and  connectives)  as  “rules”  in  the 
knowledge  base  we  discuss. 

The  types  of  implications  or  rules  we  consider  are 
called  if-then-rules.  Our  knowledge  base  contains  a 
se„  of  statements  (assertions  or  facts)  that  are  true  at 
any  specific  point  in  time  and  a  set  of  if-then-rules  that 
are  true  at  all  times.  The  facts  represent  the  current 
world  state  and  are  considered  the  explicit  declarative 
knowledge  of  the  system.  The  rules  encode  the  gener¬ 
al  knowledge  about  the  world  and  are  the  implicit 
declarative  knowledge.  The  production  system 
makes  inferences  based  on  both  the  facts  and  the  rules. 
If  any  new  assertions  become  true  during  one  cycle  of 
the  inference  process,  they  are  included  as  facts  in  the 
input  to  the  system  during  the  next  inference  cycle. 

The  production  system  we  discuss  is  a  forward¬ 
chaining  or  data-driven  system,  as  opposed  to  a  back¬ 
ward-chaining  production  system  that  is  goal-driven. 
In  any  production  system,  a  set  of  goal  formulas  is 
defined.  When  any  one  of  the  goal  formulas  becomes 
true,  the  system  has  reached  its  final  state  and  the 
inferences  stop.  In  a  forward  system,  the  initial  state 
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of  the  system  is  set  to  the  facts  in  the  database  that  are 
true  and  the  system  is  allowed  to  make  inferences  until 
one  of  the  goal  formulas  becomes  true. 

We  restrict  the  rules  in  our  knowledge  base  to  be  of 
the  form 

For  rule  r. 

IF  (a,„AND  .  .  .  AND  a„) 

OR  (flpAND  .  . .  AND  O,)  (1) 

OR  ... 

THEN  Cr 

where  the  facts  a,  can  be  negations  and  where  the  sets 
|am  to  a„|,  |ap  to  a,),  etc.  are  not  necessarily  disjunct  (to 
allow  the  OR  statement  description  above).  The  set 
elements  am,  etc.  ore  the  antecedents  for  rule  r  and  cr  is 
the  consequent.  An  example  of  a  knowledge  base  in  a 
forward  production  system  is  shown  in  the  graph  or 
decision  net  in  Fig.  1.  Every  possible  fact  a,  or  cr  in  the 
knowledge  base  is  represented  by  at  least  one  node  (ai 
to  aio  and  Ci  to  C3),  although  Fig.  1  shows  only  one  node 
per  fact.  The  consequents  of  the  rules  are  represented 
by  the  bottom  nodes  (ci  to  03)  and  the  antecedents  by 
all  the  other  nodes  above  them  (aj  to  aio).  A  specific 
rule  r  is  thus  represented  by  the  OR  of  the  different 
paths  to  a  consequent  node  cr  with  each  path  being  the 
and  of  a  set  of  antecedents  as  in  Eq.  (1).  For  the 
example  in  Fig.  1,  rule  1  is 

IF  (a,  ANDasl 

OR  (a  1  AND  ag  AND  a9) 

OR  (a2  AND  a6  AND  a9) 

THEN  Cl- 

As  Fig.  1  shows,  our  knowledge  base  with  rules  in  the 
form  of  (1)  will  not  have  a  tree  structure.  Rather,  as  in 
Fig.  1,  there  may  be  several  root  nodes  that  lead  to  the 
same  intermediate  node  (e.g.,  a2  and  a4  can  both  lead  to 
the  intermediate  node  07),  and  intermediate  nodes 
that  lead  to  one  another  (e.g.,  06  to  aio)  and/or  to  the 
same  bottom  nodes  (e.g.,  05  and  09  lead  to  ci).  The 
knowledge  base  in  Fig.  1  is  completely  described  by  the 
three  rules  for  the  three  consequents.  Thus  it  can  be 
implemented  in  one  cycle  (matrix-vector  multiplica¬ 
tion).  However,  if  ci  =  03,  the  iterative  nature  of  our 
production  system  and  its  ability  to  make  many  new 
inferences  from  a  limited  number  of  facts  and  rules  can 
be  seen.  Specifically,  if  cj  is  true  and  ci  —  a3,  the 
system  learns  other  rules  (e.g.,  a  1  AND  05  AND  03  and  as 
-*  C3,  ai  AND  06  AND  ag  AND  03  AND  Os  -*■  C3  and  02  AND 
06  AND  09  AND  03  AND  os  — ■  C3)  on  successive  iterations 
(without  having  been  told  these  rules  explicitly). 

As  a  specific  example  of  our  production  system,  we 
chose  the  problem  of  controlling  a  mobile  robot  (such 
as  the  autonomous  land  vehicle8)  in  an  urban  area. 
This  example  will  allow  us  to  detail  specific  rules  (in 
terms  of  objects  such  as  cars  rather  than  abstract  ele¬ 
ments  such  as  a„  and  cr )  and  to  specify  when  a  neural  or 
symbolic  substitution  production  system  is  appropri¬ 
ate  (as  we  will  detail).  This  is  a  case  in  which  most  of 
us  are  experts  due  to  our  experience  in  driving  motor 
vehicles.  The  final  results  of  the  inferences  (the  goal 
formulas)  are  control  instructions  such  as  “slow 
down,”  “blow  the  horn,”  etc.  We  assume  that  the 
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Fig.  1.  Example  of  a  decision  net  representation  of  a  rule  base. 

vehicle  is  equipped  with  a  Doppler  radar  that  provides 
velocity  information  and  a  video  camera  looking  in  the 
same  direction  as  a  human  driver  of  a  vehicle  usually 
does.  The  video  images  are  first  processed  by  a  sym¬ 
bolic  correlator  as  described  elsewhere9  and  summa¬ 
rized  below. 

The  symbolic  correlator  compares  the  image  with  a 
set  of  geometric  symbols  of  various  forms,  scales,  and 
rotations  and  outputs  a  symbolic  description  of  the 
objects  in  the  image  and  their  spatial  locations.  This 
denotes  which  geometric  symbols  are  present,  their 
scales,  rotations,  and  relative  positions.  This  correla¬ 
tor  is  part  of  a  symbolic  preprocessing  system  that  has 
its  own  knowledge  base  and  that  infers  the  identity  of 
the  objects  in  the  field  of  view.  The  stored  geometric 
symbols  are  the  facts  in  the  symbolic  correlator’s 
knowledge  base.  Geometric  models  of  the  possible 
objects  in  terms  of  geometric  symbols  are  the  rules  in 
the  knowledge  base.  The  preprocesssing  symbolic 
correlator  system  draws  conclusions  on  the  identity  of 
the  object  based  on  the  geometric  model  rules.  The 
emphasis  in  this  paper  is  on  the  production  system 
that  operates  on  the  symbolic  correlation  object  out¬ 
put  data  and  not  on  how  the  symbolic  correlation  data 
are  obtained  (although  an  analog  production  system 
could  be  used  to  obtain  this  data). 

We  assume  three  categories  of  facts  in  our  produc¬ 
tion  system: 

(1)  inputs  from  the  symbolic  correlator  (e.g.,  that 
the  obstacle  in  the  field  of  view  is  a  car,  plus  its  position 
and  orientation); 

(2)  inputs  such  as  the  velocity  of  the  objects;  and 

(3)  control  and  monitoring  signals  from  the  vehi¬ 
cle’s  steering  and  acceleration  subsystems  (that  give 
the  speed  of  the  vehicle  and  its  direction  of  travel). 

The  rules  in  our  production  system  include  knowl¬ 
edge  about  how  to  steer  and  control  the  vehicle  in  the 
presence  of  obstacles  in  different  situations.  As  an 
example  of  a  typical  situation,  consider  the  case  when 
the  vehicle  is  moving  along  smoothly  in  the  right  lane 
of  the  road  (which  is  indicated  by  a  curb,  i.e.,  a  solid 
line  and  some  elevation,  or  by  gravel,  i.e.,  a  change  in 
texture  on  the  right  side  and  a  broken  white  line  on  the 
left  side).  A  typical  situation  would  be  when  another 
car  suddenly  pulls  out  in  front  of  the  vehicle  and  moves 
slower  than  the  vehicle.  In  such  a  situation,  appropri¬ 
ate  control  signals  or  consequents  of  goal  formulas 
associated  with  this  situation  could  be  to  blow  the 
horn,  change  lanes,  slow  down,  stop,  etc.  The  rules  are 
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such  that  they  use  the  specific  symbolic  preprocessor 
outputs,  the  available  sensors,  and  the  control  and 
monitoring  signals,  e.g., 

IF  (object  in  field  of  view  is  a  car 

AND  on  collision  course  with  this  object 
AND  object  moving  slower  than  vehicle) 

THEN  slow  down. 


HI.  Neural  Network  Architectures  for  Production 
Systems 

We  first  consider  the  use  of  a  synthetic  neural  net¬ 
work  to  implement  the  production  system.  As  dis¬ 
cussed,  we  use  a  forward-chaining  system,  i.e.,  a  data- 
driven  production  system.  Such  a  system  does  not 
know  in  advance  which  specific  goal(s)  to  look  for,  and 
thus  it  bases  its  inferences  on  its  current  known  state  of 
the  world  and  not  on  what  other  facts  are  necessary  to 
satisfy  a  particular  goal. 

A.  Binary  Neural  Network  System 

As  an  initial  case,  we  consider  binary-valued  deci¬ 
sions  and  logic  statements,  i.e.,  we  do  not  consider 
reasoning  with  uncertainties.  We  use  a  two-layer  neu¬ 
ral  network  with  binary  neurons  (neurons  with  outputs 
that  can  take  on  only  the  values  0  and  1)  to  store  the 
rules  and  make  the  inferences.  The  input  nodes  (first 
layer)  represent  the  antecedents  of  the  rules  and  the 
output  nodes  (second  layer)  represent  the  conse¬ 
quents.  The  antecedent  nodes  in  the  input  are  con¬ 
nected  to  their  consequent  nodes  in  the  output  by  links 
of  weight  one  (since  uncertainties  are  not  considered). 
The  links  store  the  connectives  (in  our  example,  they 
are  only  ands)  of  the  rules.  The  output  nodes  are 
nonlinear  computing  elements  that  sum  their  inputs 
and  threshold  the  result  to  give  binary- valued  outputs. 
In  a  forward-chaining  system,  the  initial  outputs  (on 
the  first  iteration)  prove  certain  consequents  to  be 
true.  These  consequents  then  become  facts  or  asser¬ 
tions  that  are  input  to  the  system  (by  feedback)  on  the 
second  iteration.  Thus,  there  must  be  an  input  and 
output  node  for  every  possible  assertion  (all  anteced¬ 
ents  and  consequents)  in  this  production  system. 

The  neural  production  system  is  best  seen  by  an 
example.  Figure  2  shows  the  interconnection  pattern 
for  the  neural  network  for  the  example  in  Fig.  3.  This 
problem  can  be  posed  as  a  matrix-vector  multiplica¬ 
tion,  if  we  assign  all  assertions  (antecedents  and  conse¬ 
quents,  i.e.,  a  through  g)  to  different  binary  input  and 
output  vector  elements  and  if  we  describe  the  intercon¬ 
nections  (the  left-hand  side  expressions  in  Fig.  3)  as 
binary  2-D  matrix  elements.  The  matrix-vector  mul¬ 
tiplication  and  subsequent  thresholding  then  yields 
the  output  vector  of  consequents.  The  output  thresh¬ 
old  (denoted  by  T  in  Fig.  2)  varies  from  element  to 
element,  depending  on  the  number  of  antecedents  in 
the  corresponding  rule.  By  replicating  several  con¬ 
nections,  all  output  neurons  can  be  made  to  have  an 
equal  number  of  inputs  and  thus  a  uniform  output 
threshold  can  be  used  and  all  interconnections  still 
have  the  same  strength.  This  approach  circumvents 


outputs  (control  signals  or  just  food  back; 


Fig.  2.  Neural  network  for  the  rules  in  the  example  in  Fig.  3. 


a-*  b 


a  and  C  and  f  -►  g 

b-fr*  a 


f  and  g  -►  C 

Fig.  3.  Rules  for  the  neural  network  example  in  Fig.  2. 


the  need  that  other  neural  network  architectures  have 
of  requiring  varying  thresholds  or  nonuniform  inter¬ 
connection  strengths.  The  antecedent  (input)  and 
consequent  (output)  vectors  together  constitute  the 
current  world  state,  which  is  a  list  of  all  the  true  asser¬ 
tions  at  a  given  point  in  time,  i.e.,  after  one  pass 
through  the  system.  In  the  example  in  Fig.  2,  if,  at 
cycle  or  iteration  n,  the  vector  elements  representing 
antecedents  a,  c,  and  /  are  all  1,  the  vector  element 
representing  the  consequent  g  will  be  1  after  multipli¬ 
cation  and  thresholding.  The  resulting  consequent 
vector  after  cycle  n  could  be  logically  ORed  with  the 
prior  input  vector,  to  provide  the  new  input  for  the 
next  pass  (cycle  n  +  1)  through  the  system.  The  new 
input  thus  has  all  the  new  consequents  obtained  in 
pass  n  included  as  facts  or  true  antecedents  for  cycle  n 
+  1  plus  the  old  true  antecedents.  This  is  necessary, 
since  some  of  the  consequents  from  one  pass  can  be 
antecedents  for  rules  and  thus  new  consequents  may 
be  satisfied  on  a  subsequent  pass.  We  can  simplify 
this  system  and  avoid  the  logical  OR  function  by  con¬ 
necting  each  input  with  its  corresponding  output  neu¬ 
ron  with  an  interconnection  strength  equal  to  the  out¬ 
put  threshold  required  or  with  multiple  unit-weight 
interconnections  which  equal  the  strength  required  to 
fire  the  output  neuron.  This  ensures  that  once  a  fact 
(neuron)  has  been  proved  (fired),  it  remains  true. 
This  is  equivalent  to  having  unit  diagonal  elements  in 
the  interconnection  matrix.  With  unit  diagonal  ele¬ 
ments,  conventional  neural  networks  such  as  the  Hop- 
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Fig.  4.  Schematic  of  an  optical  matrix-vector  implementation  of 
the  neural  network  in  Fig.  2. 


field  model  cannot  be  shown  to  converge.  However,  as 
noted  in  Sec.  I,  our  production  system  application  of 
neural  networks  is  different  and  thus  is  not  subject  to 
the  same  constraint.  This  process  (of  matrix-vector 
multiplication,  thresholding  and  ORing)  is  iterated  un¬ 
til  one  of  the  goal  formulas  is  proved  or  until  no  change 
occurs  in  the  output  neuron  states. 

We  can  conceive  of  situations  where  we  want  to 
deviate  from  the  constraint  of  unit  diagonal  elements 
for  all  the  facts.  For  example,  a  neuron  that  gets  its 
input  from  the  symbolic  correlator  output  should  only 
fire  when  the  geometric  symbol  that  it  represents  is 
present  and  should  stop  firing  when  that  symbol  disap¬ 
pears.  Such  a  neuron  should  not  have  a  connection 
between  it  and  the  output  neuron  correpsonding  to  it. 
However,  a  fact  such  as  “the  final  destination  is  Little 
Rock,  Arkansas”  will  only  be  true  during  initialization 
of  the  network  and  the  network  will  have  to  “remem¬ 
ber”  it,  i.e.,  the  neuron  representing  it  should  keep 
firing.  This  can  only  be  achieved  by  a  unit  diagonal 
element  and  feedback.  Therefore,  in  the  design 
stages,  all  the  facts  that  have  to  remain  true  once  they 
are  fired  should  be  assigned  a  unit  diagonal  element. 
As  noted  earlier,  this  is  a  new  synthesis  method  for  the 
connection  matrix  suitable  for  our  application. 

Figure  4  shows  the  schematic  of  an  optical  imple¬ 
mentation  of  the  neural  net  in  Fig.  2.  We  use  an 
optical  binary  vector-matrix  multiplier  with  a  1-D 
array  of  laser  diodes  at  Pi  (for  the  input  vector  ele¬ 
ments),  a  2-D  matrix  of  rules  at  P2,  and  a  linear  array  of 
detectors  at  P3  (whose  vector  output  is  the  matrix- 
vector  product) .  After  thresholding,  these  P3  outputs 
yield  the  neuron  states  (and  the  vector  for  the  next 
iteration).  The  matrix  elements  (interconnections) 
are  fixed  (and  binary)  so  that  film  can  be  used  at  P2. 
The  thresholding  of  the  P3  result  can  be  performed 
electronically  in  the  feedback  loop  to  the  input  Pi  laser 
diode  array  or  can  be  included  optically  on  the  output 
P3  device.  The  output  vector  signals  can  be  fed  back 
electronically  or  optically  with  mirrorB  or  optical  fibers 
(in  an  all-optical  system).  Figure  4  indicates  electron¬ 
ic  thresholding  and  feedback.  The  optical  connec¬ 
tions  are  easily  achieved.  The  light  from  Pi  is  spread 
out  horizontally  by  cylindrical  lens  Ll  to  uniformly 
illuminate  the  rows  of  the  2-D  array  in  P2.  The  array 
at  P2  has  the  connections  from  Pi  to  P2  encoded  on  it 
as  a  matrix.  The  light  leaving  P2  is  the  point-by-point 
product  of  the  input  Pi  vector  and  each  of  the  column 
vectors  in  P2.  These  products  are  integrated 


(summed)  vertically  and  imaged  horizontally  by  cylin¬ 
drical  lens  L2  onto  the  1-D  detector  array  in  P3.  The 
driver  circuits  for  the  input  laser  diodes  could  have 
memory  (so  that  once  a  laser  diode  is  on,  it  stays  on; 
with  nonzero  diagonal  interconnection  matrix  ele¬ 
ments  this  is  not  necessary).  With  this  system,  all 
possible  inferences  based  on  the  current  world  state 
are  made  in  parallel.  This  produces  a  new  world  state 
on  which  all  inferences  are  made  in  parallel,  etc.  If  the 
spatial  light  modulator  at  P2  has  1000  X  1000  ele¬ 
ments,  the  input  Pi  and  output  P3  vectors  have  1000 
elements  and  the  capacity  of  the  production  system  is 
1000  assertions. 

B.  Analog  Neural  Network  System  (Including 
Probabilities) 

We  now  consider  the  case  of  reasoning  with  uncer¬ 
tainties,  i.e.,  when  probabilities  are  associated  with  the 
antecedents  and/or  the  consequents  of  the  rules.  The 
probabilities  are  heuristic  and  are  derived  from  our 
general  knowledge  about  the  world.  This  case  can  be 
handled  by  an  analog  neural  network.  The  input  and 
output  neurons  (vector  elements)  are  now  analog  (not 
binary  as  in  the  previous  discussions)  and  can  assume 
any  real  value  between  0  and  1.  Each  neuron  repre¬ 
sents  a  fact  (input  antecedent  or  output  consequent) 
and  its  output  (the  amount  by  which  it  is  firing)  is  the 
probabilitiy  of  that  fact  being  true.  The  connection 
strengths  (matrix  elements)  between  the  input  neu¬ 
rons  (the  antecedents  of  the  rules)  and  the  output 
neurons  (consequents)  represent  the  contributions  of 
the  antecedents  to  making  the  different  consequents 
true.  The  outputs  from  the  antecedent  neurons  are 
multiplied  by  the  connection  strengths  and  a  nonlinear 
operation  is  performed  on  the  sum  at  each  of  the  out¬ 
put  neurons.  The  nonlinear  function  can  be  a  sigmoi¬ 
dal  function  f(x)  *  1/[1  +  exp(-  ax)],  where  a  deter¬ 
mines  the  slope  of  the  nonlinearity.  In  the  limits,  the 
sigmoidal  function  can  be  a  step  (a  *  ®)  or  approxi¬ 
mately  linear  when  a  is  made  very  small  compared  to 
the  inverse  of  x  in  the  region  of  interest. 

Let  Wi  be  the  probability  of  antecedent  i  being  true 
(the  amount  of  trueness  of  fact  i)  and  let  Pjiwi  *  1)  be 
the  probability  that  consequent  j  is  true  given  that 
antecedent  i  is  true.  For  illustration,  consider  a  rule  in 
the  knowledge  base  that  infers  that  the  object  in  the 
image  is  a  truck  if  there  are  two  circular  and  one 
rectangular  geometrical  symbols  present.  Let  Wi  rep¬ 
resent  the  probability  that  a  circular  symbol  is  present, 
let  u?i,i  be  the  probability  that  two  circular  symbols  are 
present,  and  let  w 2  be  the  probability  that  a  rectangu¬ 
lar  symbol  is  present.  Also  let  p\  be  the  probability 
that  the  object  is  a  truck.  Let  us  heuristically  assign  a 
probability  of  0.1  that  the  object  is  a  truck  if  there  is 
one  circular  symbol  present,  a  probability  of  0.2  if  only 
a  rectangle  is  present,  and  a  probability  of  0.8  that  the 
object  is  a  truck  if  two  circular  symbols  and  a  rectangle 
are  all  simultaneously  present.  In  our  notation,  the 
probabilities  are  written  as 

p,(u>,  ■  1)  "  0.1, 
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p,(u-2  *  1)  =  0,2 
p, (u  , ,  *  1;  w-2  «=  1)  =  0,8. 

Note  that  pi(u'U  =  1;  wo  -  1)  =  0.8  is  not  equal  to 
2pi(u)i)  +  pi(wo)  =  0.4.  Thus,  the  probability  of  a 
truck  increases  nonlinearly  as  more  evidence  is  gained 
(i.e.,  the  probability  of  a  truck,  when  three  of  its  geo¬ 
metrical  components  are  present,  is  larger  than  the 
sum  of  the  probabilities  when  each  of  the  individual 
components  is  present).  Thus  a  nonlinear  function  is 
needed  to  map  the  inputs  to  the  proper  output  neuron 
strength. 

Figure  5  shows  the  interconnection  strengths  and 
operations  required  for  a  general  neural  network  for 
reasoning  with  uncertainties.  The  strengths  of  the 
output  (consequent)  and  input  (antecedent)  neurons 
are  given  by  c,  and  a„  respectively.  The  connection 
strength  between  a,  and  c;  is  denoted  by  wtJ  and  the 
strength  of  a  given  output  element  is  c;  =  /(2  Wija,), 
where  f  denotes  the  output  nonlinear  function  and  the 
sum  is  over  all  assertions.  Let  us  now  detail  how 
analog  values  might  be  assigned  to  the  strength  a,  of 
the  input  neurons.  Each  fact,  such  as  “a  circular  sym¬ 
bol  is  present,”  is  associated  with  one  of  the  input 
neurons.  If  these  facts  are  obtained  from  a  symbolic 
correlator  and  if  the  correlation  peak  value  is  0.6,  we 
would  assign  a  certainty  of  60%  to  that  statement  and 
the  output  strength  of  this  neuron  would  be  0.6.  The 
inputs  to  a  second-layer  neuron  are  the  products  of  the 
outputs  of  the  input  first-layer  neurons  and  the 
weights  that  connect  them  to  that  output  neuron. 
Each  output  neuron  forms  the  sum  of  its  inputs  and 
performs  the  nonlinear  operation  /  on  the  resultant 
sum.  These  outputs  indicate  the  probabilities  that 
the  associated  consequents  are  true.  The  slope  of  the 
nonlinear  function  and  the  weights  connecting  the  in¬ 
put  and  output  neurons  are  used  to  provide  an  output 
Cj  equal  to  the  proper  analog  probability.  Since  it  is 
preferable  to  have  the  same  nonlinear  function  for  all 
output  neurons,  we  fix  the  output  nonlinear  function 
and  vary  the  interconnection  weights  wt]  to  achieve  c; 
outputs  equal  to  the  desired  probabilities.  In  this 
analog  neural  network,  the  optical  mask  at  P2  of  Fig.  4 
must  have  analog  transmittances. 

IV.  Symbolic  Substitution  Production  System 

A  forward -chaining  production  system  can  also  be 
implemented  using  symbolic  substitution.  The  use  of 
a  symbolic  substitution  production  system  is  appropri¬ 
ate  when  a  set  of  similar  situations  (such  as  the  control 
of  a  fleet  of  vehicles)  has  to  be  handled  in  parallel  on 
one  processor  (as  we  will  detail  later).  Symbolic  sub¬ 
stitution10  involves  two  steps:  the  recognition  of  an 
input  symbol  and  the  substitution  of  an  output  symbol 
for  that  input  symbol.  This  pair  of  input-output  sym  - 
bols  is  called  a  substitution  rule.  The  set  o  substitu¬ 
tion  rules  specifies  the  symbolic  substitution  system. 
The  implementation  of  symbolic  substitution  we  con¬ 
sider  uses  two  cascaded  optical  correlators  as  shown  in 
Fig.  6.  In  this  architecture,  frequency-multiplexed 
matched  spatial  filters  of  the  possible  input  symbols 
are  placed  in  P2  and  the  first  correlation  plane  P3  has 


Fig.  5. 


Interconnection  strengths  and  operations  required  in  a 
neural  network  for  reasoning  with  uncertainties. 
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Fig.  6.  Cascaded  optical  correlator  system  for  symbolic  substitu¬ 
tion. 


peaks  in  different  regions  denoting  where  the  different 
symbols  are  in  the  input  plane  Pi.  We  threshold  P3  to 
provide  delta  functions  at  the  locations  of  the  input 
symbols.  P3  serves  as  input  to  the  second  correlator. 
The  Fourier  transforms  of  the  output  symbols  to  be 
substituted  are  spatially  multiplexed  at  P4.  This 
achieves  the  substitution  of  the  output  symbol  by  con¬ 
volution  with  the  delta  functions.  The  different  spa¬ 
tial  frequency  carriers  at  P4  produce  the  sum  of  the 
different  substituted  patterns  at  P5  as  is  desired.  This 
system  is  detailed  elsewhere.1112  The  typical  use  for 
these  systems  has  been  in  optically  performing  logic 
and  numberic  functions.  We  now  consider  its  use  in  a 
new  production  system  application. 

A.  System  Overview 

We  first  overview  our  symbolic  substitution  produc¬ 
tion  system  concept.  For  our  production  system  ap¬ 
plication,  the  input  to  the  symbolic  substitution  ma¬ 
chine  is  a  coded  image  representing  the  facts  (or 
assertions)  that  are  true  at  this  point,  with  each  possi¬ 
ble  assertion  represented  as  a  specific  symbolic  pattern 
in  a  specific  location.  An  assertion  is  zero  until  it 
becomes  true.  The  system  implements  the  if-then- 
rules  in  the  knowledge  base  by  looking  at  which  ante¬ 
cedents  (symbols)  are  present  in  the  input  and  substi¬ 
tuting  the  symbols  for  the  consequents  of  those  rules 
whose  antecedents  are  present.  We  substitute  only 
for  assertions  that  were  not  true  previously  and  do  not 
change  symbols  for  assertions  that  were  previously 
true  since  they  are  still  true.  By  changing  zero  sym¬ 
bols  the  system  thus  adds  the  new  facts  (consequents) 
that  become  true  at  the  invocation  of  the  rules.  As 
noted  earlier,  the  consequents  are  placed  at  their  prop¬ 
er  locations  in  the  output 
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B.  Filter  Organization 

Symbolic  substitution  logic  and  numeric  processors 
(such  as  Fig.  6)  substitute  patterns  in  the  same  location 
as  the  input  patterns.  However,  since  our  symbolic 
substitution  production  system  substitutes  output 
symbols  in  different  locations  from  those  of  the  input 
symbols,  the  second  correlator  in  the  cascade  in  Fig.  6 
will  be  modified  from  its  prior  versions1112  as  we  dis¬ 
cuss  in  Sec.  IV.C.  As  an  example,  consider  the  case 
when  input  and  output  facts  (i.e.,  antecedents  and 
consequents)  are  the  same  (e.g.,  our  ci  =  03  example  in 
Fig.  1).  In  this  instance,  we  assign  the  same  location 
and  symbol  to  both.  If  both  a  fact  and  its  negation 
enter  as  antecedents  a„,  each  is  assigned  a  separate 
location  and  symbol.  We  also  allow  separate  locations 
for  objects  in  different  positions  in  a  scene  (e.g.,  the  car 
is  in  front,  back,  left,  or  right  in  our  mobile  vehicle 
example). 

We  now  discuss  the  specific  organization  of  the  fil¬ 
ters.  Our  rules  are  in  the  form  of  the  OR  of  antecedent 
sets  (each  of  which  is  the  AND  of  a  set  of  antecedents), 
i.e., 

IF  (a  AND  6) 

OR  (c  AND  d)  (2) 

THEN  I. 

In  the  example  in  (2),  there  are  several  antecedent  sets 
(which  we  refer  to  as  conjunctions).  Each  of  these 
must  be  viewed  as  a  separate  filter  at  P2  of  Fig.  6,  each 
conjunction  has  a  separate  location  assigned  to  it  in  P3, 
and  all  conjunctions  associated  with  the  same  conse¬ 
quent  are  assigned  to  the  same  region  (output  correla¬ 
tion  plane)  in  P3.  The  filter  implements  the  logic  AND 
operation  directly  (rather  than  using  the  symbolic  sub¬ 
stitution  system  to  implement  logic  functions).  W'e 
now  detail  these  issues.  The  rule  in  (2)  can  be  rewrit¬ 
ten  as  two  rules  with  the  same  consequent 

a  and  6  — •  x  (3) 

r  AND  d  -*  Jr. 

Each  rule  is  represented  by  a  separate  filter  function 
with  the  filters  being  the  conjunctions  a  AND  b  and  c 
AND  d.  Thus,  the  two  filters  each  contain  both  ante¬ 
cedents  and  their  locations  properly  encoded  and  the 
logic  AND  operation  is  included  in  the  filter  design.  To 
see  why  separate  filters  and  P3  locations  are  necessary 
for  each  conjunction,  recall  that  several  antecedents 
exist  within  each  conjunction  and  several  conjunctions 
yield  the  same  consequent  as  in  (2)  and  (3).  If  the 
same  P3  location  were  assigned  to  the  two  conjunctions 
in  (2)  or  (3),  then  if  only  the  antecedents  a  and  c  (i.e., 
parts  of  each  of  the  two  conjunctions)  were  present,  the 
P3  output  would  exceed  threshold  (by  summing  par¬ 
tial  correlations  from  several  conjunctions). 

The  organization  of  the  rules  as  the  OR  of  antecedent 
set*  of  and  operations  was  chosen  since  it  is  comput¬ 
able  with  the  production  system  implementation  de¬ 
scribed  above.  The  optical  filters  perform  the  AND 
(and  hence  the  conjunction  operation)  by  addition  on 
the  P3  thresholding  array  device.  We  arrange  the 
location  of  the  output  correlation  peaks  such  that  con¬ 


junctions  with  the  same  consequent  lie  in  different 
locations  in  the  same  region  of  P3.  If  a  peak  exceeds 
threshold  anywhere  within  one  of  these  P3  regions,  the 
second  correlator  (P3  to  P5)  substitutes  the  proper 
consequent  symbol  in  its  proper  location  in  P5.  Once 
the  symbolic  substitution  has  been  done  at  PS  for  one 
pass  through  the  system,  the  PS  outputs  are  fed  back 
and  ORed  with  the  prior  Pi  data  to  produce  the  new 
antecedents  that  are  now  true  for  the  next  pass 
through  the  system.  Thus,  this  architecture  also  iter¬ 
ates  similarly  to  the  neural  network  system.  We  now 
detail  the  filter  design,  symbolic  patterns  used,  and  a 
new  optical  architectural  implementation. 

C.  Filter  Implementation  and  Symbolic  Pattern  Selection 

Since  the  rules  (separate  conjunctions  are  separate 
rules)  are  not  linear,  we  cannot  implement  the  symbol¬ 
ic  substitution  in  one  correlator,  rather  we  require  a 
cascaded  correlator  such  as  in  Fig.  6.  The  first  correla¬ 
tor  (Pi  to  P3)  achieves  the  recognition  and  conjunction 
of  each  antecedent  set  and  each  conjunction  has  a 
specific  location  in  P3  assigned  to  it  (i.e.,  a  correlation 
peak  will  appear  at  a  specific  location  in  P3  if  a  given 
conjunction  is  true).  This  is  easily  achieved  since  the 
location  of  each  antecedent  in  Pi  is  known,  fixed,  and 
specified.  Thus  the  P2  filters  include  this  positional 
Pi  information  and  the  carrier  spatial  frequency  for 
each  filter  is  chosen  to  select  the  P3  region  of  the 
correlation  peak.  The  specific  location  of  the  correla¬ 
tion  peak  in  the  region  (output  correlation  plane)  is 
determined  by  the  correlator.  We  select  the  positions 
of  the  antecedents  in  Pi  such  that  antecedents  that  are 
members  of  conjunctions  associated  with  the  same 
consequent  lie  in  the  same  region  in  P3.  This  reduces 
the  range  of  spatial  frequencies  required  on  the  P4 
filters  (discussed  later)  to  read  out  the  same  conse¬ 
quent  pattern  for  each  conjunction  in  the  same  region 
of  P3.  Since  the  locations  of  the  antecedents  in  PI  are 
known,  the  locations  of  the  correlation  peaks  in  P3  can 
be  specified.  The  P2  filters  can  be  spatially  or  fre¬ 
quency-multiplexed  or  a  combination  of  spatially  and 
frequency-multiplexed  P2  filters  can  be  employed  to 
achieve  the  required  results.  The  substitution  filters 
at  P4  (in  the  second  correlator)  are  required  to  substi¬ 
tute  (activate)  the  symbol  (in  a  specified  location  in 
PS)  for  any  consequent  that  is  now  true  or  instantiated 
(given  the  present  world  state  of  antecedents).  Since  a 
peak  (after  P3  thresholding)  anywhere  in  any  of  the  P3 
regions  (corresponding  to  different  consequents) 
should  activate  the  associated  consequent  symbol,  we 
encode  the  different  consequent  symbol  patterns  on 
different  sets  of  spatial  frequencies  at  P4.  The  spatial 
frequencies  used  for  the  filters  of  one  consequent  are 
determined  from  the  positions  in  P3  of  correlation 
peaks  (corresponding  to  different  conjunctions  that 
yield  the  same  consequent).  Thus,  a  correlation  peak 
in  any  of  a  number  of  specified  locations  in  P3  yields 
activation  of  the  symbol  for  that  consequent  at  the 
specified  PS  location.  The  instantiated  consequents 
are  superimposed  at  PS  with  the  PS  threshold  set  to 
implement  the  OR  function  in  rules  of  the  form  of  (2). 
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The  P5  data  are  then  fed  back  to  PI  to  produce  a  new 
world  state  of  antecedents  for  the  next  iteration.  If 
the  spatial  light  modulator  at  Pi  has  memory,  the  P5 
data  can  be  simply  fed  back  iteratively  to  Pi  to  form 
the  new  antecedents  (the  OR  of  the  prior  facts  and  the 
new  consequents/facts).  This  use  of  symbolic  substi¬ 
tution  differs  from  prior  work  (involving  the  imple¬ 
mentation  of  logic  and  numeric  functions)  since  the 
output  symbols  are  substituted  in  different  locations, 
not  in  situ. 

We  now  consider  the  symbolic  patterns  used  for  the 
antecedents  and  consequents.  If  simple  points  in 
specified  Pi  locations  (rather  than  patterns)  are  used 
to  denote  true  antecedents,  a  set  of  input  points  with 
given  relative  spatial  locations  can  instantiate  an  out¬ 
put  P3  correlation  peak  (regardless  of  the  absolute 
location  of  the  set  of  points  in  Pi ) .  This  will  yield  false 
peaks  in  P3  at  locations  proportional  to  the  position  of 
the  set  of  points  in  Pi.  In  this  instance,  this  is  a 
disadvantage  of  the  use  of  a  correlator  when  the  loca¬ 
tions  of  input  facts  are  known  (i.e.,  a  correlator  auto¬ 
matically  searches  all  absolute  locations  of  antecedent 
set  patterns,  even  though  this  is  not  necessary  here). 
The  possible  cross-correlation  effect  of  this  at  P3  can 
be  reduced  by  using  symbolic  patterns  (not  points)  for 
each  fact.  Two  choices  exist  for  these  symbolic  pat¬ 
terns.  We  can  use  binary  patterns  (i.e.,  use  logolV 
pixels  to  represent  the  patterns  for  N  facts).  This 
requires  N  \og  >N  pixels  in  Pi  for  N  facts  (antecedents 
and  consequents).  As  an  alternative,  we  can  use  or¬ 
thogonal  patterns  for  the  symbols  for  the  N  facts. 
This  will  remove  cross  correlations  at  the  peak  location 
at  the  cost  of  an  increase  in  the  space-bandwidth  prod¬ 
uct  (SBWP)  required  at  Pi.  Specifically,  the  use  of 
orthogonal  symbolic  patterns  requires  N  pixels  per 
symbol  or  N 2  pixels  to  encode  N  symbols  (antecedents 
and  consequents)  compared  with  log2iV  pixels  per  sym¬ 
bol  or  N  log2lV  pixels  to  encode  N  symbols  with  binary 
patterns.  Thus,  the  use  of  binary  symbols  reduces  the 
required  SBWTP  at  Pi  by  the  ratio  (log2A0/A7.  For 
large  N,  this  can  be  significant  (e.g.,  for  N  -  1000,  the 
use  of  orthogonal  symbolic  patterns  increases  the 
SBWP  required  at  PI  by  a  factor  of  100).  Hence,  the 
use  of  binary  encoded  symbolic  patterns  is  preferable 
from  practical  implementation  considerations.  This 
seems  to  be  realistic,  since  the  probability  is  small  of 
finding  simultaneously:  M  instantiated  antecedents 
(whose  relative  spatial  locations  are  the  same  as  that  of 
one  of  the  possible  correct  conjunctions)  whose  corre¬ 
lation  with  the  P 4  filter  for  that  conjunction  would 
yield  a  correlation  peak  in  P3  at  an  allowed  location, 
and  that  these  antecedent  patterns  will  each  have  a 
high  cross  correlation  with  the  corresponding  true  set 
of  anteceden  u>.  Thus,  we  adopt  binary-encoded  sym  - 
bolic  patterns  and  consider  orthogonal  patterns  only 
when  the  device  at  PI  can  accommodate  the  increased 
SBWP. 

Two  other  issues  associated  with  the  symbolic  pat¬ 
tern  selection  merit  attention  and  discussion.  Since 
the  locations  of  possible  correlation  peaks  in  P3  are 
known,  we  are  not  concerned  with  cross-correlation 


values  associated  with  the  correlation  of  a  reference 
and  shifted  versions  of  the  input  symbols  [since  we  will 
utilize  a  mask  of  apertures  (holes)  in  front  of  P3  with 
apertures  placed  only  at  the  locations  of  possible  legiti¬ 
mate  conjunctions].  Other  researchers  considering 
symbolic  substitution  correlators  have  expressed  con¬ 
cern  about  this.  A  second  >ssue  is  the  need  for  the 
different  symbolic  patterns  to  have  the  same  energy 
(same  number  of  black-and-white  pixels).  This  is  not 
essential,  since  the  weight  of  the  different  filters  can  be 
adjusted  to  yield  the  same  correlation  peak  height, 
even  if  the  energy  of  the  N  symbolic  patterns  differs. 
However,  when  cross  correlations  are  considered,  their 
normalization  with  respect  to  all  possible  input  pat¬ 
terns  is  not  easily  achieved.  Thus,  it  is  preferable  for 
each  symbolic  input  pattern  to  be  the  pattern  plus  its 
conjugate.  This  is  detailed  elsewhere91  and  thus  an 
increase  in  the  SBWP  of  symbolic  patterns  by  a  factor 
of  2  is  required  to  include  this  effect.  This  also  solves 
the  cross-correlation  problem  at  peak  locations  dis¬ 
cussed  earlier,  since  it  assures  that  every  conjunction 
now  has  a  unique  active  (white  pixels)  symbolic  pat¬ 
tern.  In  this  case,  the  cross  correlation  of  any  two 
patterns  will  differ  by  at  least  l/N  from  the  autocorre¬ 
lation  and  can  thus  be  removed  by  thresholding. 

D.  Alternative  Optical  Production  System  Symbolic 
Substitution  Architectures 

Since  the  shift  invariance  of  a  correlator  is  not  neces¬ 
sary,  one  could  implement  the  production  system  with 
the  Pi  to  P3  system  being  a  matrix-matrix  or  vector 
inner  product  (VIP)  multiplier.  If  symbolic  patterns 
were  not  used  for  the  Pi  facts,  such  a  Bystem  would  be 
identical  to  the  neural  network  architecture  of  Sec.  III. 
The  disadvantage  of  using  a  VIP  processor  is  that 
multiple  situations  (i.e.,  control  of  a  fleet  of  autono¬ 
mous  vehicles)  cannot  be  easily  handled  and  that  the 
basic  cascaded  optical  correlator  architecture  would  be 
altered  (and  thus  the  same  optical  system  would  not 
allow  multifunctional  use  for  logic,  numeric,  morpho¬ 
logical,  and  production  system  applications).  To  re¬ 
tain  the  ability  of  this  system  to  handle  multiple  situa¬ 
tions  (e.g.,  a  fleet  of  vehicles),  we  must  retain  the  shift 
invariance  of  a  correlator.  However,  with  facts  en¬ 
coded  on  one  of  several  lines  in  Pi,  we  require  only 
vertical  shift  invariance.  Thus,  the  use  of  a  1-D  verti¬ 
cal  correlator  would  suffice  and  reduce  cross-correla¬ 
tion  peak  intensities  (since  horizontal  shifts  of  PI  pat¬ 
terns  would  now  not  contribute  to  false  correlation 
peaks). 

We  now  advance  a  preferable  new  optical  symbolic 
substitution  production  system  architecture  (Fig.  7) 
for  this  case  in  which  the  location  of  all  possible  facts  is 
fixed  (position  encoded).  In  this  case,  we  use  a  binary 
spatial  light  modulator  (SLM)  at  Pi  with  one  pixel  per 
fact.  This  significantly  reduces  the  SBWP  require¬ 
ments  for  PI.  We  use  the  position  encoding  of  Pi 
symbols  to  encode  the  symbols  for  each  fact  with  a 
fixed  in  situ  mask  at  P2  placed  directly  behind  Pi. 
Thus,  we  need  only  activate  one  pixel  on  the  binary 
SLM  at  Pi  and  this  automatically  inputs  a  2-D  sym- 
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Fig.  7.  New  optica!  production  system  symbolic  substitution  archi¬ 
tecture  with  reduced  SBWP  requirements  and  fixed  symbol  mask. 


bolic  pattern  to  the  system  (with  the  symbolic  pattern 
provided  by  the  fixed  film  P2  mask,  thus  reducing  the 
SBWP  requirements  for  realistic  real-time  SLMs  at 
Pi ) .  This  new  architecture  allows  a  larger  production 
system  (one  with  more  facts)  to  be  realized,  without 
wasting  real-time  SBWP  at  Pi  for  the  symbolic  pat¬ 
tern  encoding.  The  system  of  Fig.  7  also  uses  only  one 
correlator,  rather  than  a  cascaded  correlator.  In  this 
case,  the  feedback  from  P3  to  PI  requires  a  simple 
CGH,  fiber-optics  connection,  or  electronic  feedback 
system  which  activates  a  Pi  pixel  if  any  pixel  in  a  given 
P3  region  is  activated.  This  new  Fig.  7  architecture 
thus  requires  a  single  three-plane  one-correlator  sys¬ 
tem  that  is  structurally  similar  (and  no  more  compli¬ 
cated)  to  the  neural  network  architecture  of  Fig.  4,  but 
which  allows  multiple  situation  processing.  The  two 
neural  network  and  symbolic  substitution  approaches 
are  compared  in  Sec.  V. 

V.  Discussion 


sionality  and  complexity  of  the  required  interconnec¬ 
tion  pattern.  The  symbolic  substitution  correlator 
also  easily  allows  control  over  the  output  correlation 
peak  intensities,  whereas  the  neural  network  system 
requires  multiple  replicated  inputs  to  easily  allow  uni¬ 
form  threshold  values  for  the  output  neurons. 

In  terms  of  the  number  of  facts  that  can  be  accom¬ 
modated,  near-term  optical  devices  limit  both  systems 
to  106  facts  (assuming  1000  X  1000  element  binary 
input  SLMs).  This  should  be  suitable  for  many  appli¬ 
cations,  with  combinations  of  multiple  parallel  archi¬ 
tectures  allowing  extension  to  more  facts.  The  storage 
capacity  (number  of  rules  that  can  be  handled)  is  limit¬ 
ed  by  the  capacity  of  the  volume  hologram  in  the 
symbolic  substitution  system.  This  is  expected  to  be 
less  than  the  number  of  interconnections  possible  in 
the  neural  net  system.  Both  architectures  allow  new 
rules  to  be  inferred  that  are  not  explicitly  encoded. 

Space  and  frequency-multiplexed  correlators  have 
already  been  fabricated.  Even  though  they  are  at  a 
more  mature  level  of  development  than  are  optical 
neural  network  architectures,  it  would  be  premature  to 
select  the  symbolic  substitution  architecture  over  the 
neural  network  one  for  this  application. 

Thus,  no  definite  conclusion  is  possible  on  the  pref¬ 
erable  choice  of  one  system.  This  can  only  be  ad¬ 
dressed  after  a  detailed  analysis  of  the  mask  required 
in  the  neural  network  and  the  positioning  require¬ 
ments  of  the  symbolic  substitution  correlator. 


Now  that  both  this  symbolic  substitution  implemen¬ 
tation  (Sec.  IV)  and  the  neural  network  realization 
(Sec.  Ill)  have  been  described,  let  us  discuss  when  the 
symbolic  substitution  implementation  is  preferable. 

In  the  original  symbolic  substitution  system  of  Fig. 
6,  each  fact  is  encoded  as  a  symbolic  pattern,  whereas 
the  neural  network  requires  only  one  pixel  per  fact. 
Hence  the  SBWP  requirement  of  the  original  symbolic 
substitution  system  is  larger  than  that  of  the  neural 
network.  However,  the  new  proposed  optical  symbol¬ 
ic  substitution  processor  of  Fig.  7  allows  equal  input 
SBW’P  requirements  for  both  approaches.  In  addi¬ 
tion,  if  the  SBWP  requirement  is  greater  than  the 
input  1-D  SBWP,  additional  facts  can  be  encoded 
more  easily  on  separate  lines  in  the  input  of  the  sym¬ 
bolic  substitution  system  (since  is  is  a  correlator)  than 
in  the  neural  network  system.  The  symbolic  substitu¬ 
tion  system  easily  extends  to  accommodate  a  large 
number  of  facts  since  it  is  a  correlator.  Conversely, 
the  neural  network  system  is  not  shift  invariant  and  if 
more  facts  are  included  by  adding  additional  rows  of 
neurons,  the  interconnection  pattern  required  be¬ 
comes  much  more  complex  (in  general  a  4-D  matrix  is 
required  to  implement  all  possible  2-D  interconnec¬ 
tions).  When  several  sets  of  similar  situations  (each 
with  the  same  sets  of  possible  facts  and  rules)  are  to  be 
processed,  the  symbolic  substitution  correlator  ap¬ 
proach  has  a  clear  preference  (since  its  full  correlation 
capability  and  shift  invariance  can  now  be  utilized, 
specifically,  we  can  search  M  parallel  sets  of  input  facts 
with  the  same  set  of  rule  filters),  whereas  the  neural 
network  requires  a  significant  increase  in  the  dimen¬ 
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Abstract 

Pattern  recognition  techniques  (for  clustering  and  linear  discriminant  function  selection)  are 
combined  with  neural-net  methods  (that  provide  an  automated  method  to  combine  linear 
discriminant  functions  into  piecewise  linear  discriminant  surfaces).  The  resulting  "adaptive- 
clustering  neural  net"  is  suitable  for  optical  implementation  and  has  certain  desirable  properties 
in  comparison  with  other  neural  nets.  Simulation  results  are  provided. 

I.  Introduction 

12  3 

Artificial  neural  networks  have  received  much  recent  attention  ’  ’  and  various  optical 

i  r  f* 

realizations^’  °  of  the  classic  backpropagation  neural  network  have  been  suggested.  Various 
other  optical  neural  network  architectures  have  been  described  ’  '  and  some  ’  ’  ’  have 

been  demonstrated  conceptually.  In  this  paper  we  distinguish  between  optimization  and 
adaptive  learning  neural  networks  (Sect.  II)  and  we  discuss  various  neural-net  issues  as 
background.  We  then  advance  a  new  "adaptive-clustering  neural  network"  (ACNN)  in  Sect.  III. 
Simulation  results  (performed  on  a  Hecht-Nielsen  Corporation  electronic  neural  network)  are 
then  presented  (Sect.  IV),  optical  realizations  of  the  ACNN  are  discussed  (Sect.  V)  and  a 
summary  is  advanced  (Sect.  VI).  This  ACNN  uses  a  new  learning  algorithm  that  combines 
standard  pattern  recognition  techniques  and  neural-net  concepts  to  arrive  at  a  new  and  quite 
useful  method  for  neural  network  synthesis  that  can  be  realized  optically  with  attractive  results 
and  potential. 


II.  Artificial  Neural  Networks 


We  distinguish  between  two  main  classes  of  neural  networks14,  15.  optimization  neural  nets 
and  adaptive  learning  neural  nets.  Optimization  neural  nets  are  well  understood  and  their  basic 
theory  is  well  established16,  l/.  Associative  processors  are  another  class  of  neural 
networks16,  1®’  “6|  *1  that  are  also  well  understood.  In  this  paper  we  consider  adaptive 
learning  neural  nets.  The  major  advantage  of  a  neural  net  in  multiclass  pattern  recognition  is  its 
ability  to  compute  nonlinear  decision  surfaces  (typically  combinations  of  linear  decision  surfaces) 
for  complex  multiclass  decision  problems.  In  fact,  many  neural-net  classifiers  can  create  decision 
boundaries  of  arbitrary  shape.  Our  proposed  neural  net  uses  this  feature  of  neural  nets  in 
conjunction  with  initial  weights  selected  using  class  prototypes  of  clusters  -  hence  we  refer  to  this 
as  an  adaptive-clustering  neural  net.  It  employs  a  three-layered  architecture,  consisting  of  input, 
hidden  and  output  layers  with  interconnections  between  the  input  and  hidden  layers,  and 
between  the  hidden  and  output  layers. 

II.l.  Neuron  representation  spaces  and  dimensionality 

To  maintain  a  reasonable  number  of  input  (Pj)  neurons,  we  recommend14,  15  that  the 
neuron  representation  space  be  an  appropriate  feature  space.  For  image  recognition  applications, 
the  feature  space  should  not  be  pixel-based.  Other  feature  spaces  have  the  additional  advantage 
that  they  can  be  made  invariant  to  transformations  such  as  in-plane  rotations.  This  greatly 
reduces  the  number  of  training  images  required  (i.e.  we  need  not  train  on  transformed  versions 
of  the  objects  to  be  identified).  For  an  M-dimensional  feature  space,  we  use  AT-pl  input 
neurons.  The  additional  neuron  is  used  to  incorporate  the  threshold  of  the  hidden-layer  neurons 
into  the  input  vector  with  the  state  of  this  neuron  set  to  unity.  We  now  detail  this.  A  linear 
discriminant  function  (LDF)  in  a  feature  space  described  by  feature  vectors  x  can  be  written  as 

g(x)=w(x-hwQ1  (1) 
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where  w  defines  the  orientation  of  the  linear  decision  boundary  and  trQ  defines  its  offset  or 
location.  When  decisions  depend  on  whether  <7$0,  then  —  wQ  is  the  threshold  for  the  vector-inner 
product  (VIP)  w*x  By  adding  an  additional  “I"  to  the  feature  vector  x  to  produce  y,  we  include 
Wq  in  w  and  we  can  now  write  Eq.  (1)  as 

g(x)  =  w‘y.  (2) 


The  number  of  neurons  in  layer  two  (hidden  layer)  is  generally  chosen  empirically.  The 
number  of  hidden-layer  neurons  determines  the  complexity  of  the  decision  surface.  Thus  too  few 
neurons  lead  to  poor  classification  performance,  since  a  decision  surface  of  complexity  sufficient 
to  separate  the  various  classes  cannot  be  created.  In  most  neural  nets,  the  use  of  too  many 
hidden  neurons  is  wasteful  of  resources  and  leads  to  poor  generalization.  By  this  we  mean  that 
the  decision  surfaces  are  adapted  to  the  peculiarities  of  the  training  set. 

Local  minima  are  a  frequent  topic  of  discussion  associated  with  the  number  of  hidden 
neurons  used.  A  local  minimum  is  a  value  of  the  energy  function  that  is  a  minimum  in  a  local 
region,  rather  than  being  a  global  minimum.  In  training  a  back  propagation  (BP)  neural  net6,  the 
initial  state  of  the  hidden-layer  neurons  is  random  and  a  given  error  rate  and  some  energy  is 
obtained.  When  training  is  repeated  with  different  initial  hidden  neuron  states,  if  a  different 
error  rate  results,  a  local  minimum  exists.  One  must  vary  the  number  of  hidden  neurons  and 
retrain  with  different  initial  conditions  to  empirically  determine  the  number  of  hidden  neurons. 
The  presence  of  such  variables  results  in  long  training  times  for  neural  nets  (as  various  numbers 
of  layer-two  neurons  and  various  starting  conditions  are  tried)  and  it  can  result  in  a  neural  net 
that  cannot  easily  be  generalized  to  test  data. 

Local  minima  occur  when  hidden  neurons  become  redundant  during  training  (e.g.  two  of 


t he  A  hidden  neurons  encode  decision  boundaries  that  lie  very  close  to  one  another).  If  each 
neuron  encoded  a  distinct  decision  boundary,  a  lower  error  rate  would  result  (if  the  number  of 
neurons  were  too  few).  When  the  number  of  distinct  hidden  neurons  is  sufficient  (equal  to  or 
greater  than  the  minimum  required),  there  is  no  effect  on  classification  performance,  since 
sufficiently  complex  decision  surfaces  can  be  created  despite  redundancies  in  the  hidden  neurons. 
Thus,  in  this  case  local  minima  are  not  of  concern.  Many  researchers  have  found  that  extensive 
methods  to  produce  100%  classification  on  training  data  are  not  merited,  since  test  set 
performance  often  does  not  reflect  such  improved  training  set  results.  Recent  work  '  on  the 
choice  of  the  number  of  hidden  neurons  has  concentrated  on  the  case  when  the  training  samples 
are  in  random  positions  in  the  feature  space,  which  is  almost  never  the  case  in  real  pattern- 
recognition  problems. 

Thus,  although  local  minima  are  not  of  major  concern,  an  alternate  technique  to  determine 
the  number  of  hidden  neurons  with  significantly  reduced  effort  is  a  significant  concern.  Our  new 
neural  net  addresses  this  issue  by  an  organized  procedure  that  selects  the  number  of  hidden 
neurons  based  on  the  number  of  clusters  present  in  the  multiclass  data  to  be  separated  (as 
detailed  in  Sect.  III). 

The  number  of  neuron  layers  used  is  another  variable.  For  BP,  it  has  been  shown24,  25  that 
any  decision  su  .ce  can  be  approximated  to  arbitrary  accuracy  with  a  three-layer  neural  net. 
Four-layer  neural  nets  can  also  produce  any  such  decision  surface,  but  they  are  harder  to  train 
(since  the  Hessian  of  the  criterion  function  with  respect  to  the  weights  is  more  ill-conditioned 
when  more  layers  are  used)  and  they  generally  introduce  more  parameters  that  must  be 
empirically  selected.  Since  our  neural  net  also  approximates  any  such  decision  boundary  with 
three  layers,  we  restrict  attention  to  a  three-layer  neural  net. 
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The  ii u m her  of  output-layer  neurons  equals  the  number  of  classes. 

11. 2.  Criterion  or  error  functions 

One  of  the  most  popular  adaptive  learning  neural  nets  is  backpropagation  (BP)6.  The 
problems  with  this  neural  net  are  that  it  requires  a  large  training  set  and  long  training  time,  and 
does  not  necessarily  converge  to  the  best  minimum.  Backpropagation  is  an  example  of  a  neural 
net  which  is  trained  by  the  minimization  of  an  error  or  criterion  function.  The  form  of  the  error 
function  that  is  minimized  for  such  nets  can  affect  performance  and  training  time  (e.g.,  the  error 
function  with  the  best  error  rate  is  often  the  one  for  which  it  is  most  difficult  to  reach  a 
minimum  error"6).  Standard  BP  uses  an  error  function  based  on  a  sigmoid  transfer  function, 
while  our  ACNN  uses  the  perceptron  error  function  in  training.  We  recently  provided  a 
comparison  of  various  error  or  criterion  functions.  It  was  show-n  that,  in  general,  the  use  of  a 
perceptron  criterion  function  provides  faster  convergence  with  comparable  error  rates  to  those 
obtained  with  the  more  popular  sigmoid  criterion  function.  The  error  function  choice  is  not  of 
major  concern  in  the  performance  of  BP  and  our  ACNN  (it  is  included  to  note  the  differences 
between  BP  and  ACNN  and  because  the  criterion  function  used  specifies  the  type  of  linear 
classifier  employed,  as  we  detail  in  Sect.  III). 

11. 3.  Update  algorithm 

One  reason  for  the  slow  convergence  of  BP  is  that  a  gradient-descent  (delta  rule)  algorithm 

is  often  used  to  update  the  weights  in  training.  Our  ACNN  uses  a  conjugate- gradient 
97 

algorithm  for  weight  update  since  it  is  faster  and  does  not  require  the  empirical  choice  of 
parameters  such  as  the  learning  rate  and  momentum  ’  .  In  conjugate-gradient  updating,  all  of 
the  training  set  data  are  fed  to  the  system  (once)  and  then  the  weights  are  updated.  Conversely, 
with  gradient  descent  the  weights  can  be  updated  after  the  presentation  of  each  sample  in  the 
training  set.  A  batch  type  of  gradient-descent  algorithm  can  also  be  used,  with  weights  updated 
only  after  all  training  data  have  been  presented  to  the  system  once.  Generally,  batch  gradient 
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descent  has  the  slowest  convergence  (since  the  parameters  cannot  be  updated  and  selected  at 
different  steps).  Sequential  (non-batch)  gradient  descent  generally  performs  better  than  batch 
gradient  descent,  since  it  makes  more  steps  toward  the  solution  (in  one  presentation  of  the 
training  set  of  data).  However,  selection  of  its  parameters  is  empirical  and  we  have  found  that 
conjugate-gradient  optimization  performs  better.  We  attribute  this  to  the  fact  that  conjugate- 
gradient  optimization  adapts  the  learning  parameters  in  a  sensible  way,  whereas  these 
parameters  are  kept  fixed  or  adapted  heuristically  for  gradient  descent. 

In  difficult  multiclass  decision  problems  we  have  found  conjugate-gradient  training  to  be 
much  more  efficient  than  gradient  descent.  With  neural  net  hardware  and  software  (such  as  the 
Hecht-Nielsen  Corporation  AZP  which  we  use)  conjugate-gradient  optimization  is  very 
attractive.  In  our  comparisons  of  BP  and  the  ACNN  we  use  the  same  conjugate-gradient 
algorithm  to  update  the  weights. 

II. 4.  Initial  weights 

Another  reason  for  the  long  training  time  for  BP  is  that  the  initial  weights  are  chosen 
arbitrarily.  In  our  ACNN  algorithm,  the  initial  weights  are  set  using  pattern-recognition 
techniques  and  then  they  are  refined  using  neural-network  techniques.  This  is  a  major  reason  for 
the  improved  performance  of  our  ACNN.  We  have  tested  BP  using  initial  weights  chosen  from 
clustering  techniques  similar  to  those  used  for  the  initial  weights  of  the  ACNN.  We  found**9 
negligible  improvement  in  training  time  and  worse  performance  in  some  cases.  We  attribute  this 
to  the  fact  that  BP  can  sometimes  use  hidden  neurons  in  more  sophisticated  ways  than  is  the 
case  in  the  hidden  layer  of  our  ACNN  and  that  this  cannot  be  achieved  when  a  preset  weight 
choice  is  used. 

This  present  section  was  intended  to  highlight  issues  associated  with  neural  networks  and 


to  note  differences  between  our  algorithm  and  the  more  extensively  tested  and  analyzed  BP 
algorithm. 

III.  Adaptive  clustering  neural  net  (ACNN)  training  algorithm 

Our  three-layer  ACNN  is  shown  in  Fig.  1.  It  is  similar  to  the  standard  multilayer 
perceptron.  We  now  detail  its  design  and  use  for  multiclass  pattern  recognition.  The  input  (Pj) 
neurons  are  analog  and  represent  a  feature  space  which  can  be  of  low  dimensionality  (we  add  an 
additional  feature  which  is  always  kept  at  unity  to  adapt  the  threshold  of  the  hidden  neurons  as 
well).  The  hidden-layer  neurons  at  P2  correspond  to  clusters  in  feature  space,  with  several 
clusters  (neurons)  used  for  each  class  in  a  multiclass  application.  The  Pj-P2  weights  are  used  to 
assign  an  input  to  a  cluster.  We  typically  use  two  to  five  clusters  per  class.  The  layer  two 
neurons  are  binary  and  (in  testing)  the  P2  neuron  with  the  largest  input  activity  fires  and 
denotes  the  cluster  to  which  the  input  belongs.  During  training  the  Pj-P0  weights  adapt  as  we 
will  detail  (we  employ  a  conjugate-gradient  algorithm)  and  thus  refine  our  initial  weight 
estimates.  The  hidden  layer  to  output  weights  are  fixed  (all  are  either  zero  or  one)  and  perform 
the  mapping  of  the  P2  clusters  to  one  of  the  classes  (with  one  Pg  neuron  assigned  per  class  of 
data).  Thus,  we  initially  assign  several  layer-two  "cluster  neurons"  to  each  class  and  use  fixed 
P 2'P 3  weights  to  assign  each  P2  cluster  to  a  final  class  (output  neuron  in  P3).  This  is  attractive 
and  new  since  it  allows  us  to  use  standard  clustering  and  pattern-recognition  techniques  to  select 
the  initial  Pj-P0  weights  (initial  LDFs)  and  new  neural-net  techniques  to  adapt  or  refine  these 
weights.  We  employ  a  perceptron  criterion  or  error  function  (this  defines  our  LDFs)  rather  than 
a  sigmoid  error  function,  since  faster  convergence  with  a  comparable  error  rate  is  obtained. 

There  are  no  commonly-used  standard  (non-neural  net)  techniques  to  obtain  piecewise 
linear  decision  surfaces  for  two-  or  multiclass  problems  (except  nearest-neighbor  methods). 
Because  of  the  importance  of  neural-net  techniques  in  addressing  this  problem,  and  since  we  use 
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nearest-neighbor  techniques  in  selecting  our  clusters,  we  briefly  review  standard  multiclass 
techniques.  In  a  nearest-neighbor  classifier,  the  distance  between  an  input  and  all  training 
samples  is  calculated  and  the  input  is  assigned  to  the  class  of  the  closest  training  sample.  From 
tests  on  all  training  data  in  each  class,  the  bounds  on  each  class  are  determined  and  one  can 
obtain  piecewise  linear  decision  surfaces.  However,  the  nearest-neighbor  technique  is 
computationally  intensive  (requiring  calculation  of  the  distance  to  all  training  samples). 
Conversely,  neural  nets  have  a  long  training  time  (which  is  off-line  and  of  less  concern)  but  their 
classification  times  (an  on-line  requirement)  are  short.  In  addition,  all  training  samples  must  be 
stored  for  a  nearest-neighbor  system  and  thus  storage  requirements  can  be  excessive.  Finally, 
nearest-neighbor  systems  do  not  perform  well  when  the  probability-density  functions  of  the 
classes  overlap  significantly.  The  calculation  of  the  K  nearest  neighbors  is  useful  here  (the  input 
is  assigned  to  the  class  to  which  the  majority  of  these  K  samples  belong).  However,  the  selection 
of  K  is  empirical. 

Two  other  multiclass  techniques  are  Gaussian  and  linear  classifiers.  Gaussian  classifiers 
assume  that  the  data  in  each  class  are  normally  distributed  and  for  each  class  its  mean  and 
variance  are  estimated.  To  classify  an  input  vector,  a  posteriori  probabilities  are  calculated  for 
each  class  with  Bayes’  rule,  and  the  input  is  assigned  to  the  class  with  the  highest  probability. 
This  technique  (and  all  parametric  methods)  work  only  if  the  data  follow  the  assumed 
distribution  and  this  is  rarely  the  case.  To  produce  multiclass  decision  boundaries  with  LDFs, 
the  mean  vector  mc  of  each  class  can  be  calculated  and  used  as  an  LDF.  The  VIP  of  the  input 
with  each  mc  and  thresholding  denotes  the  class  estimate  for  the  input.  Criterion  functions 
(error  functions)  represent  a  preferable  way  to  select  an  LDF  for  each  class.  One  can  employ 
pairwise  LDFs  (for  each  LDF,  some  class  i  is  compared  with  another  class  j).  These  approaches 
are  computationally  intensive  and  not  attractive  for  problems  with  many  classes  and  they  may 
lead  to  decision  surfaces  that  have  undefined  regions  (not  corresponding  to  any  class). 
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Thus,  standard  linear-discriminant  techniques  for  multivariate  pattern  recognition  allow  us 
to  determine  suitable  linear  discriminants,  but  these  are  generally  not  powerful  enough  for 
realistic  pattern-recognition  applications  that  require  nonlinear  decision  surfaces.  In  our  ACNN, 
neural-net  techniques  provide  refinements  to  the  linear-discriminant  weight  estimates  and 
automatically  combine  many  linear  decision  boundaries  into  piecewise  linear  decision  boundaries. 
We  now  detail  the  design  and  update  rules  for  our  ACNN. 

m.l.  Selection  of  the  number  of  hidden  layer  (cluster)  neurons 

To  select  the  prototypes/exemplars  or  cluster  representatives  we  use  two  steps.  As  our 
prototypes  we  desire  the  N  prototypes  in  the  training  set  whose  removal  cause  the  most  error  in 
a  nearest-neighbor  classification.  We  assume  a  large  training  set  (iVy. samples)  for  our  multiclass 
problem  (so  large  that  simple  clustering  techniques  cannot  produce  a  suitable  set  of  clusters).  We 

on 

first  use  standard  techniques  for  sample-number  reduction  to  obtain  a  modest  number  of 
prototypes  N^.  This  "reduced  nearest-neighbor"  clustering  technique  divides  the  Nj.  samples 
into  two  groups  (A  and  B),  where  the  samples  in  A  classify  all  samples  correctly  using  a 
nearest-neighbor  technique.  Initially,  all  samples  are  in  group  B.  The  samples  in  A  are  used  as 
the  prototypes  in  a  nearest-neighbor  classifier.  Each  sample  in  B  is  sequentially  presented  to  the 
nearest-neighbor  classifier.  If  it  is  incorrectly  classified,  it  is  added  to  A.  This  procedure  is 
repeated  until  the  samples  in  group  A  can  correctly  classify  all  Nj.  samples.  (Typically  around 
5%  to  30%  of  the  training  samples  are  still  present  in  N ^  and  this  is  still  too  large  a  number  of 
P2  neurons.) 

Thus  we  employ  a  second  step  to  further  reduce  the  number  of  prototypes  (clusters)  to  an 
acceptable  number  N.  To  achieve  this  we  remove  the  first  prototype,  use  the  remaining  7V^>—  1 
samples  in  a  nearest-neighbor  classifier  to  classify  the  TVy,  original  samples,  and  calculate  the 
number  of  misclassifications.  We  then  remove  only  the  second  prototype,  and  repeat  the  above 
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procedure  with  the  remaining  A^—  1  samples.  This  procedure  continues  until  the  removal 
(separately)  of  each  of  the  prototypes  has  been  tested.  If  N  is  prespecified,  we  keep  the  N 
prototypes  whose  removal  would  cause  the  most  errors.  We  can  also  use  the  number  of  errors 
obtained  by  removing  each  prototype  to  select  A^(i.e.  we  select  A^that  results  in  no  more  than  a 
given  error  rate  or  for  which  there  is  a  jump  in  the  number  of  errors  produced).  We  insure  that 
at  least  one  prototype  is  chosen  from  each  class.  Insuring  that  we  keep  one  prototype  per  class 
has  not  been  a  problem  in  our  benchmarks  (i.e.  if  the  prototypes  are  ordered  by  their  error  rate, 
we  do  not  find  a  number  of  consecutive  prototypes  in  one  class  before  one  from  another  class 
occurs).  In  our  initial  benchmarks  we  have  not  found  significant  branch  points  or  jumps  in  the 
error  rates  of  the  ordered  samples.  There  is  also  no  restriction  that  the  same  number  of 
prototypes  be  selected  from  each  class  (the  data  will  determine  this).  Considerable  flexibility  is 
possible  in  how  the  N  prototypes  are  selected  since  training  will  refine  the  initial  choices,  and 
thus  this  issue  is  not  of  major  concern. 

This  procedure  does  not  account  for  the  fact  that,  when  several  samples  are  not  included  as 
prototypes,  performance  will  be  worse  than  when  only  one  of  the  samples  is  omitted.  However, 
the  purpose  of  selecting  prototypes  (or  cluster  representatives)  is  only  to  provide  a  reasonable  or 
approximate  initial  selection  (the  neural  net  adaptations  of  these  initial  choices  address  the 
global  problem). 

We  note  that  use  of  a  nearest-neighbor  technique  for  training  is  acceptable,  but  it  is  not 
suitable  for  classification  (where  on-line  real-time  requirements  exist).  The  combination  of  our 
nearest-neighbor  prototype  selection  and  ACNN  update  algorithm  will  be  shown  to  require  fewer 
iterations  than  BP.  To  quantify  the  significance  of  this,  we  now  briefly  address  the  number  of 
operations  required  to  select  prototypes  and  relate  it  to  the  number  of  operations  required  in  one 
BP  iteration  on  all  Nj,  training  samples.  For  each  sample,  our  prototype  selection  algorithm 


must  calculate  the  distance  to  all  other  points  in  the  training  set.  For  all  Nj,  samples,  the 
calculation  of  the  distances  from  all  points  to  alt  points  (i.e.  the  number  of  distance  calculations 
required  for  one  pass  through  the  training  samples)  is  approximately  0.5 (we  precalculate 
this  once  and  use  the  0.5  factor  since  the  calculations  are  symmetric).  In  BP,  all  TVy.  samples  are 
presented  and  after  each  sample  we  must  calculate  the  activities  of  all  N  neurons  ( N  hidden 
neurons  are  assumed  and  the  calculation  of  the  activities  of  the  output  neurons  is  ignored),  i.e. 
NjJV  calculations  are  required.  The  calculation  times  for  the  operations  in  the  two  cases  are 
equivalent,  each  is  a  VIP  of  dimension  equal  to  that  of  the  feature  space  used  (the  calculation 
times  for  each  operation  are  exact  for  the  case  of  layer  1  and  2  neurons).  If  the  additional 
number  of  BP  iterations  required  is  /,  then  for  our  algorithm  to  be  computationally  efficient,  we 
require 

0.5 Ay2  <  AyA7.  (3) 

Since  A  jC§>A/  our  algorithm  may  not  offer  a  significant  advantage  in  training  time  (once  N  is 
fixed  in  BP)  unless  I  is  very  large. 

In  obtaining  the  result  in  Eq.  (3),  we  assumed  that  all  samples  were  used  in  selecting 
the  N  prototypes.  We  have  found  that  we  need  only  use  approximately  5N  randomly  selected 
samples  from  the  full  set  of  Nj>  in  our  prototype  selection  ( TV  is  the  number  of  prototypes  or 
cluster  neuron  used  at  P ^  and  we  have  always  found  that  2  to  5  neurons  per  class  suffice).  Thus, 
we  employ  our  algorithm  using  5N  samples  (not  A^).  The  inequality  to  be  satisfied  is  now 
0.5(5N)2  <  NjNI 

25N  <  2 NjJ.  (4) 

To  further  evaluate  this,  we  assume  Njf^lOON  (this  is  quite  typical  for  distortion-invariant 
problems  to  adequately  represent  all  distortions).  We  then  find 


which  is  independent  of  TV.  This  inequality  is  always  satisfied.  As  we  shall  see,  BP  has  always 
required  at  least  on  the  order  of  7=100  more  iterations  of  the  full  training  set  than  has  our 
ACNN  algorithm.  In  this  case 


25  <  2X104, 


and  the  computational  time  savings  of  our  algorithm  is  quite  significant. 


(6) 


Thus,  to  summarize,  in  the  two  steps  of  our  prototype  selection  algorithm  we  use  57V 
random  samples  from  the  full  TVj,  set.  We  select  the  number  of  hidden  neurons  TV  to  be  2  to  5 
times  the  number  of  classes  (depending  on  the  difficulty  of  the  problem).  Section  IV  details  these 
choices  for  two  examples. 


m.2.  Initial  Pj-Pg  weights 

We  now  address  how  we  select  the  initial  Pj-Pg  (input-to-hidden  layer)  weights.  We  denote 
the  weight  between  input  neuron  j  and  hidden  neuron  i  by  u>(.  *  We  denote  the  vector  position  of 
prototype  i  in  our  /^-dimensional  feature  space  by  p^-  (i.e.  this  is  the  feature  vector  for  prototype 
i)  and  element  j  of  it  by  p.  *  We  can  now  describe  the  input  weights  from  P,  to  P„  as 

tj  1  « 


rp- -for 


=  fP.ytor  j= 

'J  \  D 

“(V2)^  P,/  for  J=D+l- 


i=l 


(7) 


The  first  D  (out  of  D+l)  elements  of  each  weight  vector  from  Pj  to  layer-two  neuron  i  are  thus 
the  feature  vector  p.  associated  with  that  prototype.  The  last  (D+l)  input  neuron  activity  is 
always  "1"  and  its  weight  to  hidden  layer  neuron  t  is  associated  with  its  LDF  threshold.  We 
choose  these  initial  weights  since  they  ensure  that  the  classifier  initially  implements  a  nearest- 
neighbor  classifier  based  on  the  prototypes,  as  we  now  detail. 
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Each  hidden  neuron  i  has  connections  from  all  D+\  input  neurons  and  thus  has  a  weight 


vector  w .  associated  with  it.  For  an  input  xq,  the  input  to  neuron  i  in  layer  two  is 

if  o 

w  x  =  p  x  —  0.5  p  .. 
i  a  a  rxj 


(8) 


where  the  first  term  is  the  contribution  to  the  VIP  from  the  first  D  weights  and  the  last  term  is 
the  contribution  due  to  the  additional  D+l  input  neuron.  We  rewrite  Eq.  (8)  as 

=  (0.5)2p/xa-0.5p/p|+0.5xa<xa-0.5xa'xa 
=  °*5[xa<xa-(p/p.— 2p/xa+xa<xa)] 

=  0.5 (llxjl2  -  IIP  .-xjj2).  (9) 


From  Eq.  (9)  we  see  that  the  VIP  is  related  to  the  Euclidean  distance  (denoted  by  ||  ||)  between 
the  input  xq  and  the  prototype  pt  associated  with  hidden  neuron  i.  The  choice  of  weights  in  Eq. 
(7)  thus  achieves  nearest-neighbor  classification  since  it  ensures,  from  Eq.  (9),  that  the  hidden 
neuron  closest  to  xfl  will  have  the  largest  input  (since  the  second  term  in  Eq.  (9)  is  then  smallest) 
and  will  be  most  active. 


m.3.  Training  (weight  update)  algorithm 

We  now  detail  how  we  update  the  initial  P^-Pg  we>ghts  to  achieve  improved  piecewise 
linear  decision  surfaces.  We  input  each  of  the  full  A^set  of  training  vectors  xfl.  For  each  xfl  we 
calculate  the  most  active  hidden  neuron  t(c)  in  the  proper  class  c  and  the  most  active  one  t(c)  in 
any  other  class  (c).  We  denote  the  weight  vectors  for  these  two  layer-two  neurons  by  w-/  *  and 
w»(e)  anc*  t^ie‘r  VIPs  with  the  input  by  and  The  perceptron  error  function 

(criterion  function)  Ep  used  is  shown  in  Fig.  2.  The  solid  (dashed)  curves  correspond  to  the  true 
(false)  classes  1  and  2  cases.  The  offset  S  is  a  safety  margin  that  forces  training  set  vectors  which 
are  classified  correctly  by  a  small  amount  (less  than  S)  to  also  contribute  to  the  criterion 
function.  As  discussed  elsewhere  we  chose  S=0.05  (all  features  were  normalized  between  0  and 
1).  The  use  of  S  forces  the  classifier  to  try  to  classify  all  training  samples  correctly  by  at  least  an 
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amount  S,  improving  test  set  performance  (and  thus  generalization). 

For  each  training  sample  in  Nj,  we  add  an  error  (penalty)  to  Ep  The  error  added  is 

E  =  °  if  w|.(e)‘xa  >  w i{efxa+S  (10) 

5+(W«(c)_Wi(c))<Xa  otherwise  » 

where  the  E—0  case  corresponds  to  the  situation  when  the  proper  layer-two  neuron  is  most 
active  (by  an  amount  S  above  the  most  active  false  neuron)  and  where  the  other  case 
corresponds  to  the  situation  when  the  false  class  VIP  is  larger  than  the  true  class  VIP,  or  within 
S  of  it. 

After  all  N™  training  samples  have  been  run  through  the  system,  we  accumulate  all  of 
these  errors  or  energies  (all  are  positive  or  zero).  We  also  accumulate  the  gradients  Vw  E.  From 

t 

Eq.  (10),  by  taking  the  derivative  with  respect  to  w^,  we  see  that  E  is  zero  for  all  i  when  an 

t 

input  is  classified  correctly  by  more  than  S ;  otherwise,  it  equals  either  xq  (if  input  a  should  be 

classified  into  the  same  class  as  cluster-neuron  t)  or  — xfl  (if  a  is  incorrectly  classified  by  cluster 

neuron  i).  Thus,  the  sum  of  all  the  contributions  to  E  equals  the  sum  of  the  ±xfl  for 

samples  erroneously  classified  (or  correctly  classified  but  with  a  margin  less  than  S)  in  layer-two 

clusters.  We  then  use  E  to  adapt  the  weights  w  by  the  conjugate-gradient  algorithm.  We 

« 

then  repeat  presentation  of  the  training  set  (a  new  iteration),  calculate  the  new  errors  E  and 

Vw  E  and  update  the  weights  accordingly.  This  procedure  repeats  until  satisfactory  performance 
i 

on  the  test  set  is  obtained. 

We  considered  other  LDFs  (Ho-Kashyap,  Fisher,  Fukanaga-Xoontz  etc.).  However,  these 
LDFs  require  more  calculation  than  our  current  algorithm  docs  to  update  the  weights.  Thus,  for 
computational  reasons,  our  present  choice  (perceptron  criterion)  is  preferable. 
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III.  4.  Input  Pj  neuron  representation  space 

In  our  distortion-invariant  multiclass  pattern  recognition  applications,  we  use  a  wedge- 

31 

sampled  magnitude  Fourier  transform  feature  space  ,  since  this  feature  space  can  easily  be 
produced  optically.  Fig.  3(a)  shows  the  standard  architecture  that  produces  the  Fourier 
transform  at  P0  of  the  Pj  input  2-D  image  data.  Fig.  3(b)  shows  the  standard  wedge-ring 
detector  used  at  P0.  The  wedge  features  provide  scale  invariance  and  the  ring  features  provide 
in-plane  rotation  invariance.  Our  distortion-invariant  data  will  involve  different  aspect  views  of 
several  objects  (and  not  in-plane  distortions).  Thus  we  chose  the  wedge  features  (this  provides 
scale  invariance,  although  we  do  not  include  scale  distortions  in  our  test  data).  We  obtain 
aspect-view  invariance  by  training  on  various  aspect-distorted  object  views. 

IV.  Test  results 

We  consider  two  databases:  an  artificial  set  of  data1"*’  2G  (to  demonstrate  the  nonlinear 
surfaces  produced  using  only  two  features)  and  a  set  of  three  aircraft  with  various  azimuth  and 
elevation  (3-D)  distortions  present.  We  refer  to  these  as  benchmarks  I  and  2. 

IV.  1.  Benchmark  I  results  (artificial  data) 

An  artificial  set  of  383  samples  in  three  classes  (181  in  class  1,  97  in  class  2  and  105  in  class 
3)  with  2  features  was  generated  with  samples  as  shown  in  Fig.  4.  This  problem  definitely 
requires  a  nonlinear  decision  boundary  and  the  results  can  be  shown  in  the  2-D  feature  space. 
This  is  the  purpose  of  this  example,  since  no  separate  test  data  exist.  The  neural  net  used 
contained  three  input  neurons  (two  for  the  features  plus  one  for  the  threshold),  six  hidden 
neurons  (two  per  class)  and  three  output  neurons  (one  per  class).  All  TVj,  samples  were  used  to 
select  the  prototypes.  The  first  "reduced  nearest-neighbor  clustering"  produced  31  prototypes 
(8.1%  of  the  total  TV y,)  that  gave  an  error  rate  P  = 0%  for  all  samples.  The  six  prototypes 
whose  removal  gave  the  most  error  were  then  selected  in  stage  two. 


II) 


After  SO  iterations  of  the  full  training  set,  the  classification  rate  (defined  as  the  percentage 
of  test  samples  correctly  classified)  was  constant  at  97.1%  with  our  ACNN  algorithm  .  After  300 
iterations  the  BP  classification  rate  was  constant  at  approximately  the  same  value  (96.3%). 
(This  result  is  the  average  obtained  over  10  runs  with  different  random  initial  weight  sets.)  The 
final  input  layer  weights  to  the  six  hidden  layer  neurons  correspond  to  six  straight  lines  (LDFs) 
in  the  feature  space.  For  BP  these  six  lines  would  define  the  decision  surface.  In  the  ACNN  this 
is  not  the  case  (because  of  the  winner-takes-all  action  at  P2).  The  decision-surface  lines  were 
determined  by  successively  providing  all  of  the  possible  feature  vectors  on  a  grid  of  x^—x^  values 
(for  both  Xj  and  x^  in  the  interval  [0,1])  to  the  classifier,  and  for  each  feature  vector  determining 
the  class  into  which  it  is  classified  by  the  neural  net.  The  decision  boundaries  indicate  where  a 
transition  in  classification  occurred.  The  boundaries  thus  obtained  are  shown  in  Fig.  5.  They 
produce  four  separate  regions  of  feature  space  (two  correspond  to  the  same  class  and  the  others 
correspond  to  the  other  two  classes). 

From  inspection  of  Fig.  4  one  would  estimate  that  a  piecewise-linear  decision  surface  with 
at  least  five  straight-line  sections  would  be  needed  to  separate  the  data  adequately  and  that 
about  ten  errors  might  be  expected.  Thus  at  least  five  hidden  neurons  are  expected  to  be  needed. 
In  Fig.  4  we  see  that,  with  six  hidden  neurons,  approximately  10  classification  errors  are  made, 
producing  the  error  rate  of  97.1%. 

Figure  6  compares  the  classification  rate  for  the  two  neural  nets  and  for  a  multivariate 
Gaussian  classifier.  Both  neural  nets  give  comparable  classification  rates  (97.1%  and  96.3%) 
after  convergence,  whereas  the  Gaussian  classifier’s  performance  is  worse  (89.5%)  and  by 
definition  does  not  vary  with  the  number  of  iterations  of  the  training  set.  The  speed  of  learning 
of  the  ACNN  is  much  faster  (convergence  in  80  iterations)  than  for  BP  (approximate  convergence 
in  300  iterations).  From  Eq.  (4)  this  represents  approximately  an  additional 
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A’7A7=(383)(6)(220)=505.r>G0  VIP  calculations  required  with  BP.  The  prototype  selection 
steps  in  our  ACNN  algorithm  required  approximately  0.5A^"  =  (0.5)(383)“?«73350  VIPs  and 
thus  the  total  number  of  calculations  and  hence  training  time  for  our  ACNN  is  considerably  less 
than  the  learning  time  for  BP.  We  reran  the  prototype  selection  portion  of  our  ACNN  algorithm 
using  only  5Ar=30  samples  randomly  selected  from  the  383,  insuring  that  we  obtain  at  least  one 
prototype  per  class.  The  decision  boundaries  produced  are  shown  in  Fig.  7.  As  can  be  seen,  the 
decision  boundaries  are  virtually  identical;  the  resulting  error  rates  differ  by  only  0.2%  (96.9% 

9 

classification  was  obtained  after  100  iterations).  This  was  now  achieved  with  only  0.5(30)“=450 
VIPs  for  prototype  selection. 

This  data  set  therefore  indicates  that  similar  performances  can  be  obtained  with  BP  and 
ACNN,  with  ACNN  training  appreciably  faster  than  BP.  We  have  also  seen  that  the  time  for 
prototype  selection  with  ACNN  can  be  made  negligible  by  using  a  reduced  number  of  learning 
samples,  without  affecting  performance  adversely. 

rV.2.  Benchmark  2  results  (3-D  distorted  aircraft  data) 

As  our  second  data  set,  we  used  synthetic  distorted  aircraft  imagery  and  our  wedge- 
sampled  Fourier  feature  space.  The  imagery  used  were  three  aircraft  (F-4,  F-104  and  DC-10) 
binarized  to  128x128  pixels  with  each  aircraft  occupying  about  the  central  100x64  pixels.  As  our 
training  set,  we  used  630  images  of  each  aircraft  (a  total  of  A^=1890  training  set  samples).  The 
images  were  different  azimuth  views  (with  the  aircraft  viewed  from  different  angles  left  to  right) 
and  elevation  views  (with  the  aircraft  viewed  from  different  angles  above  or  below  its  center 
line).  The  range  of  azimuth  angles  used  covered  -85  °  to  +85  *  and  the  elevation  angle  was 
varied  from  0  °  to  90  *  with  5  e  increments  in  each  angle  (the  same  image  results  if  negative 
elevation  angles  are  used).  The  input  neuron  representation  space  was  a  32  element  feature  space 
(the  32  wedge  magnitude  Fourier  samples).  The  test  set  used  consisted  of  578  orientations  of 
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each  aircaft  not  present  in  the  training  set  (these  were  views  at  internal  angles  about  2.5  '  in 
each  direction  from  those  in  the  training  set).  Fig.  8  shows  three  distorted  versions  of  each 
aircraft.  The  left  image  is  the  top-down  view  with  0°  variation  in  elevation  and  azimuth.  The 
central  image  shows  a  view  from  an  azimuth  angle  of  45  °  to  the  left.  The  right  image  for  each 
object  shows  an  image  with  elevation  angle  of  45  0 . 

The  three-layer  ACNN  used  contained  33  input  neurons,  9  hidden  neurons  and  three 
output  neurons  (one  per  class). 

Fig.  9  compares  the  speed  (number  of  iterations  of  the  full  training  set)  and  classification 
performance  for  the  two  neural  nets  and  the  Gaussian  classifier.  Both  neural  nets  yield  the  same 
classification  rate  (98.6%)  compared  to  only  89%  for  the  Gaussian  classifier.  BP  converges  in  350 
iterations  and  our  ACNN  in  fewer  (180)  iterations.  As  with  the  two-dimensional  data  set,  a 
reduced  data  set  for  prototype  selection  can  be  employed  successfully.  It  was  found  that  with 
5N—45  samples  used  for  prototype  selection,  98.6%  classification  performance  was  obtained 
after  180  iterations.  With  this  reduced  number  of  samples,  the  time  for  prototype  selection  is 
negligible  compared  with  the  time  for  a  single  iteration,  so  that  the  relative  training  times  are 
again  determined  by  the  number  of  iterations  required  for  each  method.  Thus,  ACNN  requires 
approximately  50%  of  the  training  time  of  BP. 

V.  Optical  and  optical/electronic  realization 

Many  choices  are  possible  for  the  role  of  optics  iij  the  learning  and  classification  stages  of 

our  ACNN.  These  are  now  discussed.  The  feature  space  (wedge-sampled  magnitude  Fourier 

transform)  should  be  optically  calculated  (even  in  learning)  since  this  feature  space  is  easily 
32  33 

produced  optically  ’  and  since  we  will  use  the  optically  produced  feature  space  in  our  on-line 
classification.  The  two  steps  of  prototype  selection  are  best  performed  electronically  -  since  they 
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are  off-line  operations  and  require  manipulation  of  stored  data  and  control  operations  most 
compatible  with  digital  electronics.  The  distance  calculations  required  in  the  nearest-neighbor 
calculations  can  be  performed  on  an  optical  VIP  architecture  (we  now  discuss  this  and  the  use  of 
optics  in  the  learning  stage). 


Once  the  initial  Pj-P,,  weights  have  been  chosen,  the  learning  stage  can  be  implemented  in 
optics  or  electronics.  Fig.  10  shows  one  such  architecture.  The  input  sample  xq  is  entered  at  Pj 
(on  LEDs,  laser  diodes  or  a  1-D  spatial  light  modulator  (SLM)).  It  is  imaged  onto  the  initial  set 
of  N  weight  vectors  (for  the  jV  prototype  hidden  layer  neurons)  which  are  arranged  on  rows  at 
PA  (with  the  first  two  to  five  rows  corresponding  to  the  prototypes  for  class  1,  the  next  two  to 
five  rows  being  the  prototypes  for  class  2,  etc.).  Thus  the  rows  at  Pa  are  the  initial  weights  as 
in  Eq.  (7).  The  VIPs  of  xq  and  all  of  the  w.  weight  vectors  at  Pa  are  formed  on  a  linear 
detector  array  at  P2-  The  Pa  rows  and  Pg  elements  are  separated  into  C  groups  (the  C  classes). 
The  maximum  VIP  element  in  each  class  is  determined  (simple  comparator  logic  is  sufficient 
since  the  number  of  prototypes  per  class  is  small).  This  provides  us  with  w,-(c)*xa  and  w,^*xa  in 
Eq.  (10).  Bipolar  values  for  w.  should  be  handled  by  spatial  multiplexing  at  Pa  and  subtraction 
of  adjacent  outputs.  Alternatively,  the  Pa  data  can  be  placed  on  a  bias  (but  this  increases 
dynamic-range  requirements).  The  weights  must  be  updated  after  each  iteration  of  the  training 
set.  If  Pa  is  a  microchannel  spatial  light  modulator0’  (or  similar  device)  that  can  record  positive 
and  negative  data  (with  a  bias  on  the  device),  we  can  update  the  weights  by  adding  and/or 
subtracting  the  appropriate  values  for  each  weight.  These  updates  to  the  weights  at  Pa  are 
various  combinations  of  the  training  vectors  xfl.  These  could  be  calculated  in  electronics,  entered 
sequentially  at  Pj  and  (with  a  mechanism  to  activate  only  selected  rows  at  P^)  we  could  update 
Pa  35  required.  Alternatively,  we  could  repeat  each  xfl  at  P^  and  vary  the  input  illumination  and 
the  P A  row  accessed  and  hence  control  the  amount  of  each  x  added  to  or  subtracted  from  each 
weight  vector  at  Pa-  The  digital  control  required,  the  complexity  of  the  system  (a  modulated 
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light  source-  to  control  the  amount  of  each  xfl  used,  access  to  only  one  row  of  at  a  time),  the 
need  for  N  accesses  of  for  each  of  the  vectors  x^,  and  the  SLM  requirements  make 
the  electronic  calculation  of  the  updated  weight  vectors  and  the  electronic  off-line 
implementation  of  the  learning  stage  preferable  (at  present).  As  SLM  technology  matures,  it 
would  probably  be  realistic  to  calculate  all  VIPs  optically,  determine  the  new  weights 
electronically,  and  reload  these  directly  into  P^  after  each  iteration  of  the  training  set.  However, 
at  present,  we  assume  that  all  learning  is  electronic  (since  it  is  off-line). 

Once  learning  has  been  completed,  the  Pj-to-P2  weights  are  fixed  and  the  input-to-hidden 
layer  neurons  and  weights  (the  P^-to-P2  neuron  system)  can  be  implemented  on  an  optical  VIP 
system  (such  as  P^  to  P2  of  Fig.  10)  with  a  fixed  mask  at  P^.  The  number  of  P^  neurons  is 
modest  (the  input  neuron  representation  is  a  compact  feature  space),  and  the  number  of  P2 
neurons  is  also  small  (typically  less  than  five  times  the  number  of  classes).  Our  ACNN  requires  a 
winner-takes- all  (WTA)  maximum  selection  of  the  most  active  P2  neuron.  This  can  be 
implemented  with  a  WTA  neural  network  or  in  standard  comparison  techniques.  Since  the 
number  of  Pg  neurons  (iV)  is  small,  standard  electronic  WTA  techniques  are  preferable  (we 
quantify  this  below).  Since  the  P9-to-P3  hidden-to-output  neuron  weights  are  fixed  and  are  all 
unity  or  zero,  the  Pg-to-Pg  weights  simply  perform  a  mapping  and  can  easily  be  implemented  in 
electronics.  Thus,  we  implement  the  input-to-hidden  layer  neuron  weights  and  calculations 
optically,  and  the  hidden  layer  neuron  maximum  selection  (WTA)  and  the  hidden-to-output 
neuron  mapping  in  electronics.  Fig.  11  summarizes  the  learning  and  classification  stages  in  block 
diagram  form  with  attention  to  which  operations  are  performed  in  optics  and  which  in 
electronics. 

The  two  WTA  electronic  techniques  possible  (in  classification)  are  to  use  an  operational 
amplifier  peak  detector  to  scan  ail  N  outputs  at  Pg  or  to  employ  a  parallel  digital  technique.  In 


the  digital  technique,  the  A  outputs  are  A/D  converted,  each  pair  of  P0  outputs  (I  and  >,  3  and 
4,  etc.)  are  pairwise  compared  and  the  maximum  of  each  pair  is  obtained.  Pairwise  comparisons 
of  the  JV/2  outputs  are  then  performed  and  the  procedure  is  continued  for  log0./V  levels  until  the 
maximum  is  obtained.  For  100  input  and  hidden  neurons,  one  matrix-vector  multiplication 
(required  to  update  the  P0  neuron  activities)  requires  about  10,000  additions  and  10,000 
multiplications,  whereas  maximum  selection  requires  only  about  100  comparisons.  Thus,  the 
maximum-selection  is  typically  negligible  computationally  compared  with  the  neuron-update 
stage,  and  can  be  implemented  in  serial  electronic  hardware  without  sacrificing  the  speed  of  the 
system.  We  thus  implement  the  WTA  operation  in  electronics  using  comparators  rather  than 
with  a  neural  net.  The  specific  electronic  WTA  technique  chosen  depends  on  the  accuracy  and 
speed  required.  Since  these  operations  are  required  once  for  each  test  input  in  classification,  the 
WTA  time  required  is  set  by  the  rate  at  which  new  input  image  data  occurs  and  the  rate  at 
which  its  features  can  be  calculated. 

VI.  Summary,  conclusions  and  discussion 

A  new  three-layer  adaptive-clustering  neural  net  (ACNN)  has  been  described.  It  provides 
for  a  new  procedure  to  select  the  number  of  hidden  layer  neurons  (we  use  several  neurons  per 
class,  each  being  a  prototype  or  cluster  representative  of  a  particular  class)  and  provides  initial 
(non-random)  input-to-hidden  layer  neuron  weights.  These  initial  weights  are  selected  using 
standard  pattern  recognition  clustering  techniques.  They  are  then  updated  during  learning  using 
a  new  neural  net  adaptive  supervised  learning  algorithm.  This  results  in  a  new  neural  net  that 
combines  standard  pattern  recognition  and  neural-net  techniques  to  produce  piecewise-linear 
decision  surfaces  from  linear  discriminant  functions.  The  input  neurons  are  analog  and  of  low 
dimensionality  (a  feature  space  with  inherent  distortion  invariances).  Quantitative  data  show 
that  the  learning  time  and  number  of  calculations  required  in  our  new  ACNN  is  significantly 
faster  (by  a  factor  of  2  to  4)  than  the  more  well-studied  BP  neural  net.  We  also  found  that  the 


use  of  a  conjugate-gradient  (rather  than  gradient  descent)  update  algorithm  significant ly  speeds 
up  BP. 


BP  and  the  ACNN  will  usually  not  result  in  similar  weights  since  BP  uses  neurons  for 
other  operations  besides  clustering,  because  BP  has  no  WTA  competition  in  its  hidden  layer  as  in 
the  ACNN  and  because  the  hidden-to-output  weights  are  different  in  BP  and  only  perform 
mapping  in  the  ACNN.  However,  the  decision  boundaries  that  result  are  usually  very  similar 
(with  the  ACNN  decision  boundaries  generally  being  a  piecewise-linear  approximation  to  the 
more  curved  ones  in  BP).  Thus  the  two  classifiers  employ  different  means  to  similar  ends,  with 
the  ACNN  providing  faster  training  without  the  need  to  select  many  empirical  parameters.  Since 
only  one  hidden  neuron  in  ACNN  is  dominant,  piecewise-linear  surfaces  result  and  more  hidden 
neurons  may  be  needed.  Our  intent  is  not  to  compare  BP  and  our  ACNN,  rather  we  note  the 
attractive  properties  of  our  new  neural  net.  Besides  providing  a  new  way  to  select  the  hidden 
neurons,  our  neural-net  algorithm  has  only  one  ad-hoc  parameter  to  be  empirically  selected  (the 
number  of  hidden  neurons).  Changes  in  ACNN  weights  during  training  provide  information  on 
the  data  that  can  be  of  use  in  better  understanding  results  and  in  extending  results  to  other 
cases  (other  neural  nets  do  not  have  this  property).  For  example,  in  sequential  gradient  descent 
updating  algorithms  (the  delta  rule)  different  results  occur  depending  on  the  order  in  which  the 
training  data  are  presented  and  depending  on  the  random  initial  weights  (by  comparison,  the 
ACNN  provides  consistent  results). 
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Figure  6:  Comparative  data  on  speed  of  convergence  for  Benchmark-1  data 

Figure  7:  Nonlinear  decision  boundaries  produced  for  the  artificial  database 
when  prototypes  are  selected  from  reduced  training  set 

Figure  8:  Representative  images  for  the  three-class  3-D  distortion  example 
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Figure  9:  Comparative  data  on  speed  of  convergence  for  Benchmark-2  data 

Figure  10:  Possible  optical  architecture  for  adaptive  learning 

Figure  11:  Block  diagram  for  adaptive-clustering  neural  net  using  (a)  electronics 
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